Optimize coefficient decoding
Separate the eob
, ac
, and dc
cases, eliminate some branches, and make some generic integer arithmetic improvements.
Runtime statistics from Chimera 1080p on Skylake-X, before and after:
12.06% dav1d libdav1d.so.2.0.0 [.] decode_coefs
9.88% dav1d libdav1d.so.2.0.0 [.] decode_coefs