Rewrite inverse transforms to prevent integer overflows
The basic idea is that with intermediates of 19+sign bits and multipliers of 12+sign bits, the intermediates are 19+12=31+sign bits, and adding two of these together can overflow, which is UB in C. To resolve this, we clip all multipliers to 11 bit by inverting them:
(a * constant_1 + b * constant_2 + 2048) >> 12
, where
constant_1 < 2048
but constant_2 >= 2048
, is identical to:
((a * constant_1 + b * (4096 - constant_2) + 2048) >> 12) + b
,
and 4096 - constant_2 < 2048
.
(Because now both constants are 11+sign instead of 12+sign bits, and intermediates are still 19 bits, the intermediates after multiply are 30+sign instead of 31+sign, and add/sub leads to 31+sign instead of 32+sign, and thus doesn't overflow. The final add/sub is post-shift and never overflows.)
In other places, where both constants are a multiple of 2, we can reduce the magnitude of both and round/shift by 11 instead of 12.
Do this in dct4,8,16,32,64 as well as adst8,16. Also slightly simplify the final phase of idct64_1d by moving the add/sub to before the multiply.
The adst4 is rewritten to be shaped like a matrix-multiply, and then use the same idea on all 4 multipliers in the matrix, since the sum of all 4 multipliers is still under 4096 in all cases.
Fixes clusterfuzz-testcase-minimized-dav1d_fuzzer-5709759466962944, credits to oss-fuzz. Also fixes #223 (closed).