Add w_mask AVX2 asm for 4:2:2 and 4:4:4
Also add some optimizations to w_avg
and w_mask_420
.
w_mask_422_w4_8bpc_c: 318.0
w_mask_422_w4_8bpc_avx2: 24.0
w_mask_422_w8_8bpc_c: 940.4
w_mask_422_w8_8bpc_avx2: 45.2
w_mask_422_w16_8bpc_c: 3006.0
w_mask_422_w16_8bpc_avx2: 108.4
w_mask_422_w32_8bpc_c: 11841.7
w_mask_422_w32_8bpc_avx2: 391.1
w_mask_422_w64_8bpc_c: 28447.6
w_mask_422_w64_8bpc_avx2: 972.9
w_mask_422_w128_8bpc_c: 71692.9
w_mask_422_w128_8bpc_avx2: 2391.6
w_mask_444_w4_8bpc_c: 336.9
w_mask_444_w4_8bpc_avx2: 22.3
w_mask_444_w8_8bpc_c: 930.1
w_mask_444_w8_8bpc_avx2: 39.0
w_mask_444_w16_8bpc_c: 2825.7
w_mask_444_w16_8bpc_avx2: 105.9
w_mask_444_w32_8bpc_c: 6644.5
w_mask_444_w32_8bpc_avx2: 374.5
w_mask_444_w64_8bpc_c: 15528.8
w_mask_444_w64_8bpc_avx2: 938.4
w_mask_444_w128_8bpc_c: 37946.1
w_mask_444_w128_8bpc_avx2: 2324.6