Cfl ac simd
Continuation of !441 (closed):
- per-width versions;
- use aligned reads/writes in
sub_loop
; - integrate
sum_loop
into the main loop; - special case w=8/16 if
wpad != 0
.
before:
cfl_ac_420_w4_8bpc_c: 367.4
cfl_ac_420_w4_8bpc_avx2: 72.8
cfl_ac_420_w8_8bpc_c: 621.6
cfl_ac_420_w8_8bpc_avx2: 85.1
cfl_ac_420_w16_8bpc_c: 983.4
cfl_ac_420_w16_8bpc_avx2: 141.0
after:
cfl_ac_420_w4_8bpc_c: 376.2
cfl_ac_420_w4_8bpc_avx2: 28.5
cfl_ac_420_w8_8bpc_c: 607.2
cfl_ac_420_w8_8bpc_avx2: 29.9
cfl_ac_420_w16_8bpc_c: 962.1
cfl_ac_420_w16_8bpc_avx2: 48.8
Edited by Ronald S. Bultje