AArch64: Fix potential out of bounds access in DotProd H/HV filters
The DotProd/I8MM horizontal and HV/2D subpel filters use -4 offset for sampling instead of -3 to be better aligned in some cases.
This resulted in an out of bounds access, which led to crashes.
This patch fixes it.
Steps to reproduce the crash on macOS:
Use the 4K @ 60 FPS version of this video stream: https://m.youtube.com/watch?v=rPWZepxc0hw, and set the MallocGuardEdges=1
environmental variable for tools/dav1d
to have guard pages around allocations.
Use tag 1.4.2 or any commit after the inclusion of !1632 (merged), then run:
tools/dav1d --muxer null -i <sample.ivf>
It shall crash with EXC_BAD_ACCESS
after the decode of frame 757:
* thread #6, name = 'dav1d-worker', stop reason = EXC_BAD_ACCESS (code=2, address=0x104403fff)
* frame #0: 0x0000000100097028 dav1d`put_8tap_neon_i8mm + 2976
frame #1: 0x000000010002799c dav1d`dav1d_recon_b_inter_8bpc + 1304
frame #2: 0x000000010005d510 dav1d`decode_b + 16796
frame #3: 0x0000000100054ac8 dav1d`decode_sb + 444
frame #4: 0x0000000100054d68 dav1d`decode_sb + 1116
frame #5: 0x0000000100054668 dav1d`dav1d_decode_tile_sbrow + 1484
frame #6: 0x0000000100075178 dav1d`dav1d_worker_task + 3372
frame #7: 0x0000000189d62f94 libsystem_pthread.dylib`_pthread_start + 136
Edited by Arpad Panyik