cdef_filter_{4x{4,8},8x8}_avx2 optimizations
Add a seperate fully edged case.
---------------------
fully edged blocks perf
------------------------------------------
before: cdef_filter_4x4_8bpc_avx2: 91.0
after: cdef_filter_4x4_8bpc_avx2: 75.7
---------------------
before: cdef_filter_4x8_8bpc_avx2: 154.6
after: cdef_filter_4x8_8bpc_avx2: 131.8
---------------------
before: cdef_filter_8x8_8bpc_avx2: 214.1
after: cdef_filter_8x8_8bpc_avx2: 195.9
------------------------------------------
See #305.
Add 2 seperate code paths for pri/sec strength equals 0. Having both strengths not equal to 0 is uncommon, branching to skip unnecessary computations is therefore beneficial.
------------------------------------------
before: cdef_filter_4x4_8bpc_avx2: 93.8
after: cdef_filter_4x4_8bpc_avx2: 71.7
---------------------
before: cdef_filter_4x8_8bpc_avx2: 161.5
after: cdef_filter_4x8_8bpc_avx2: 116.3
---------------------
before: cdef_filter_8x8_8bpc_avx2: 221.8
after: cdef_filter_8x8_8bpc_avx2: 156.4
------------------------------------------
Full decode comparison against my local master (this branch is based on my local master too):
"e3dbf926 - arm64: looprestoration: NEON implementation of SGR for 10 bpc - Martin Storsjö"
Fully edged blocks checkasm bench - 10 runs
Test was changed as it only benches non zero pri and sec strength for edges == 0xf:
tests/checkasm/cdef.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/tests/checkasm/cdef.c b/tests/checkasm/cdef.c
index cde4f45..db25415 100644
--- a/tests/checkasm/cdef.c
+++ b/tests/checkasm/cdef.c
@@ -57,7 +57,7 @@ static void check_cdef_filter(const cdef_fn fn, const int w, const int h) {
if (check_func(fn, "cdef_filter_%dx%d_%dbpc", w, h, BITDEPTH)) {
for (int dir = 0; dir < 8; dir++) {
- for (enum CdefEdgeFlags edges = 0x0; edges <= 0xf; edges++) {
+ for (enum CdefEdgeFlags edges = 0xf; edges <= 0xf; edges++) {
#if BITDEPTH == 16
const int bitdepth_max = rnd() & 1 ? 0x3ff : 0xfff;
#else
@@ -85,17 +85,8 @@ static void check_cdef_filter(const cdef_fn fn, const int w, const int h) {
pri_strength, sec_strength, dir, damping, to_binary(edges));
return;
}
- if (dir == 7 && (edges == 0x5 || edges == 0xa || edges == 0xf)) {
- /* Benchmark a fixed set of cases to get consistent results:
- * 1) top/left edges and pri_strength only
- * 2) bottom/right edges and sec_strength only
- * 3) all edges and both pri_strength and sec_strength
- */
- pri_strength = (edges & 1) << bitdepth_min_8;
- sec_strength = (edges & 2) << bitdepth_min_8;
bench_new(a_dst, stride, left, top, pri_strength, sec_strength,
dir, damping, edges HIGHBD_TAIL_SUFFIX);
- }
}
}
}
Edited by Victorien Le Couviour--Tuffet