Skip to content

x86: Add 6-tap variants of high bit-depth mc AVX-512 (Ice Lake) functions

Henrik Gramner requested to merge gramner/dav1d:x86_6tap_mc_16bpc_avx512 into master

Checkasm numbers on Zen 4:

           8-tap    6-tap

w2_v:       21.7     19.5
w2_hv:      30.5     28.5

w4_v:       24.0     21.3
w4_hv:      33.6     32.0

w8_h:       33.6     28.7
w8_v:       37.9     35.9
w8_hv:      62.4     47.4

w16_h:      70.1     57.0
w16_v:      59.7     53.5
w16_hv:    143.9    117.3

w32_h:     173.3    137.2
w32_v:     125.0    101.0
w32_hv:    414.7    251.1

w64_h:     606.5    484.2
w64_v:     430.5    347.2
w64_hv:   1387.0    848.9

w128_h:   1709.9   1354.8
w128_v:   1207.2    994.3
w128_hv:  3831.9   2361.9

This also removes the (quite large in terms of code size) hv_w32 path for 8-tap as it was only around 2% faster compared to the hv_w16 path, and it was rarely used now that a separate 6-tap implementation exists. The hv_w32 path in 6-tap is around 5% faster than the corresponding hv_w16 path so that's worth keeping.

Merge request reports

Loading