Add ipred_z2 AVX2 asm
Skylake-X:
intra_pred_z2_w4_8bpc_c: 304.2
intra_pred_z2_w4_8bpc_avx2: 66.1
intra_pred_z2_w8_8bpc_c: 866.1
intra_pred_z2_w8_8bpc_avx2: 103.8
intra_pred_z2_w16_8bpc_c: 2052.5
intra_pred_z2_w16_8bpc_avx2: 204.3
intra_pred_z2_w32_8bpc_c: 5181.3
intra_pred_z2_w32_8bpc_avx2: 366.8
intra_pred_z2_w64_8bpc_c: 11040.8
intra_pred_z2_w64_8bpc_avx2: 848.4
The code has a lot of branches though, so benchmark results can vary a fair amount depending on the specific angle.