Skip to content

Feat: fg compute block avg sse4

Ferdinand Mom requested to merge feat-fg-compute-block-avg-sse4 into master
  • The goal of this MR is to optimize fg_compute_block_avg which is part of the function pp_process_frame.
  • To benchmark, we decided to call the function pp_process_frame 100 times with 10 iterations for each call. The code for the benchmark can be found here

By the help of perf and Hotspot software , we can see where the function pp_process_frame spends the most time computing.

> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/libovvc/.libs perf record ./examples/.libs/filmgraintest

> pp_process_frame    : 6.932500 ms (average time / 100 calls of 10 iteration each)]
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 1,041 MB perf.data (27262 samples) ]

fg_compute_block_avg accounts for 19.4% of computing time.

Let's benchmark fg_compute_block_avg_sse4:

> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/libovvc/.libs perf record ./examples/.libs/filmgraintest

> pp_process_frame    : 6.181763 ms (average time / 100 calls of 10 iteration each)]
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0,915 MB perf.data (23944 samples) ]

With the new version, fg_compute_block_avg_sse4 now accounts for 6.18% of the computing time.

Edited by Ferdinand Mom

Merge request reports