Skip to content

x86/deblock_{hbd_,}avx2: use vpblendvb instead of pand/pandn/por in flat16/8/6

Ronald S. Bultje requested to merge rbultje/dav1d:deblock_avx2_blend into master

(Slightly faster.)

Function Before After
lpf_h_sb_uv_w4_8bpc_avx2 140.1 140.7
lpf_h_sb_uv_w6_8bpc_avx2 188.0 184.8
lpf_h_sb_y_w4_8bpc_avx2 281.4 280.6
lpf_h_sb_y_w8_8bpc_avx2 394.8 388.9
lpf_h_sb_y_w16_8bpc_avx2 536.7 525.7
lpf_v_sb_uv_w4_8bpc_avx2 39.8 44.3
lpf_v_sb_uv_w6_8bpc_avx2 80.4 78.5
lpf_v_sb_y_w4_8bpc_avx2 77.0 76.7
lpf_v_sb_y_w8_8bpc_avx2 203.3 199.0
lpf_v_sb_y_w16_8bpc_avx2 300.3 290.2
lpf_h_sb_uv_w4_16bpc_avx2 143.4 143.2
lpf_h_sb_uv_w6_16bpc_avx2 197.9 197.2
lpf_h_sb_y_w4_16bpc_avx2 280.7 287.5
lpf_h_sb_y_w8_16bpc_avx2 439.8 433.1
lpf_h_sb_y_w16_16bpc_avx2 569.6 556.9
lpf_v_sb_uv_w4_16bpc_avx2 77.7 77.9
lpf_v_sb_uv_w6_16bpc_avx2 118.9 116.3
lpf_v_sb_y_w4_16bpc_avx2 152.8 152.5
lpf_v_sb_y_w8_16bpc_avx2 268.5 260.9
lpf_v_sb_y_w16_16bpc_avx2 347.3 332.2
Edited by Ronald S. Bultje

Merge request reports

Loading