arm64: filmgrain: Fix overflows in gen_grain (!1208) · Merge requests · VideoLAN / dav1d

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-gen-grain-overflow into master May 25, 2021

After multiplying two int8_t, the maximum possible output is -128*-128 = 16384. One can't add two such values in an int16_t (even if all the products of all other int8_t combinations can be).

Previously the summing used 16 bit intermediates for the sum of two products and only lengtheted the result to 32 bit when accumulating three or more products.

Before:                    Cortex A53       A72       A73   Apple M1
gen_grain_y_ar1_8bpc_neon:   112598.5   71309.2   74889.8   372.2
gen_grain_y_ar2_8bpc_neon:   139932.4   91442.3   95788.4   387.3
gen_grain_y_ar3_8bpc_neon:   185607.6  115691.6  131655.8   403.0
After:
gen_grain_y_ar1_8bpc_neon:   112968.8   71897.9   76171.2   371.2
gen_grain_y_ar2_8bpc_neon:   142768.8   94517.9   97934.4   387.5
gen_grain_y_ar3_8bpc_neon:   191625.2  121083.0  135975.3   405.6

arm64: filmgrain: Fix overflows in gen_grain

Merge request reports