x86/filmgrain: simplify post-horizontal filter blending
This commit makes a handful of minor changes:
- in horizontal blending, use
shufps
orvpblendd
. If we change fewer pixels than can be used as one source operand for the given instruction (8 or 4 bytes), we abuse0,32
as a edge/cur pair weight, so that the resulting blended register contains an unmodified cur grain. This replaces more complicatedvpblendw + vpblendd
orpand/pandn/por
blending combinations. - for scaling LUTs, always use
psrld
instead ofpand
, since the latter requires a register.
Edited by Ronald S. Bultje