WIP: Implement prefetch of mvs
32 fps -> 33.5 fps on Cortex-A55
I don't think this will help on out of order cores. Would like to see results to see if it hurts too much.
TODO:
-
Investigate doing multiple rows -
Optimize for screen edges? -
Benchmark on other cores -
Shorten filter length for 4 tap -
Make Arm specific -
Limit prefetch for large motion vectors -
Prefetch for warped motion and rescale -
More???
Edited by Kyle Siefring