Reduce memory usage
Up to 70% reduction in overall memory usage with 8-bit 4:2:0 and frame threading enabled.
This should cover the most impactful allocations.
There are perhaps a few more possible sumsampling-related improvements that could be looked into, and some palette code could be templated to be able to use pixel
instead of uint16_t
, but I think we can leave those for some other time.
Testing (especially obscure corner cases like changing bitdepth/resolution/sumsampling) welcome.