Use grouped context setting
Decreases runtime of decoding first 1000 frames of Chimera (1080p, 8bit)
from 12.227 to 12.075s (average of 6 runs) after changing decode.c, and
further down to 12.027s (1.67%) with the changes to recon_tmpl.c included.
After the changes to lf_mask.c, it goes down to 11.842s (3.25%).
Edited by Ronald S. Bultje