Memory usage reductions 2.0
Decoding Chimera 1080p 8bpc with --threads 16 --framedelay 4
, before and after:
Type Allocs Reuses Share Peak size
---------------------------------------------------------------------
Palette data 16 0 14.9% 17 694 720
---------------------------------------------------------------------
9101 49096 118 416 488
Type Allocs Reuses Share Peak size
---------------------------------------------------------------------
Palette data 16 0 8.1% 8 847 360
---------------------------------------------------------------------
9101 49096 109 569 256
Checkasm numbers for the new pal_idx_finish
function on x86-64:
pal_idx_finish_w4_c: 41.8 ( 1.00x)
pal_idx_finish_w4_ssse3: 9.1 ( 4.62x)
pal_idx_finish_w4_avx2: 9.5 ( 4.38x)
pal_idx_finish_w4_avx512icl: 9.4 ( 4.44x)
pal_idx_finish_w8_c: 85.6 ( 1.00x)
pal_idx_finish_w8_ssse3: 11.5 ( 7.44x)
pal_idx_finish_w8_avx2: 11.3 ( 7.57x)
pal_idx_finish_w8_avx512icl: 11.0 ( 7.79x)
pal_idx_finish_w16_c: 162.5 ( 1.00x)
pal_idx_finish_w16_ssse3: 29.3 ( 5.54x)
pal_idx_finish_w16_avx2: 17.9 ( 9.08x)
pal_idx_finish_w16_avx512icl: 16.4 ( 9.90x)
pal_idx_finish_w32_c: 202.8 ( 1.00x)
pal_idx_finish_w32_ssse3: 61.1 ( 3.32x)
pal_idx_finish_w32_avx2: 36.9 ( 5.49x)
pal_idx_finish_w32_avx512icl: 20.4 ( 9.94x)
pal_idx_finish_w64_c: 336.0 ( 1.00x)
pal_idx_finish_w64_ssse3: 120.2 ( 2.80x)
pal_idx_finish_w64_avx2: 82.1 ( 4.09x)
pal_idx_finish_w64_avx512icl: 42.1 ( 7.97x)