add SSSE3 w_mask_420/blend/blend_v/blend_h
avx2 adaption minus 4 less registers and 1 rodata table
w_mask_420_w4_8bpc_c: 548.9
w_mask_420_w4_8bpc_ssse3: 48.7
w_mask_420_w8_8bpc_c: 1707.5
w_mask_420_w8_8bpc_ssse3: 118.6
w_mask_420_w16_8bpc_c: 5606.4
w_mask_420_w16_8bpc_ssse3: 307.0
w_mask_420_w32_8bpc_c: 22181.7
w_mask_420_w32_8bpc_ssse3: 1197.6
w_mask_420_w64_8bpc_c: 53531.9
w_mask_420_w64_8bpc_ssse3: 2893.6
w_mask_420_w128_8bpc_c: 133425.4
w_mask_420_w128_8bpc_ssse3: 7113.7
Edited by François Cartegnie