x86/deblock_{hbd_,}avx2: use vpblendvb instead of pand/pandn/por in flat16/8/6
(Slightly faster.)
Function | Before | After |
---|---|---|
lpf_h_sb_uv_w4_8bpc_avx2 | 140.1 | 140.7 |
lpf_h_sb_uv_w6_8bpc_avx2 | 188.0 | 184.8 |
lpf_h_sb_y_w4_8bpc_avx2 | 281.4 | 280.6 |
lpf_h_sb_y_w8_8bpc_avx2 | 394.8 | 388.9 |
lpf_h_sb_y_w16_8bpc_avx2 | 536.7 | 525.7 |
lpf_v_sb_uv_w4_8bpc_avx2 | 39.8 | 44.3 |
lpf_v_sb_uv_w6_8bpc_avx2 | 80.4 | 78.5 |
lpf_v_sb_y_w4_8bpc_avx2 | 77.0 | 76.7 |
lpf_v_sb_y_w8_8bpc_avx2 | 203.3 | 199.0 |
lpf_v_sb_y_w16_8bpc_avx2 | 300.3 | 290.2 |
lpf_h_sb_uv_w4_16bpc_avx2 | 143.4 | 143.2 |
lpf_h_sb_uv_w6_16bpc_avx2 | 197.9 | 197.2 |
lpf_h_sb_y_w4_16bpc_avx2 | 280.7 | 287.5 |
lpf_h_sb_y_w8_16bpc_avx2 | 439.8 | 433.1 |
lpf_h_sb_y_w16_16bpc_avx2 | 569.6 | 556.9 |
lpf_v_sb_uv_w4_16bpc_avx2 | 77.7 | 77.9 |
lpf_v_sb_uv_w6_16bpc_avx2 | 118.9 | 116.3 |
lpf_v_sb_y_w4_16bpc_avx2 | 152.8 | 152.5 |
lpf_v_sb_y_w8_16bpc_avx2 | 268.5 | 260.9 |
lpf_v_sb_y_w16_16bpc_avx2 | 347.3 | 332.2 |
Edited by Ronald S. Bultje