x86: Improve high bitdepth cdef_filter AVX2 asm
old -> new
cdef_filter_4x4_16bpc_c: 779.5
cdef_filter_4x4_16bpc_ssse3: 94.4
cdef_filter_4x4_16bpc_avx2: 107.1 73.3
cdef_filter_4x8_16bpc_c: 1450.0
cdef_filter_4x8_16bpc_ssse3: 157.9
cdef_filter_4x8_16bpc_avx2: n/a 107.2
cdef_filter_8x8_16bpc_c: 581.4
cdef_filter_8x8_16bpc_ssse3: 261.3
cdef_filter_8x8_16bpc_avx2: 252.1 159.9