x86: Increase precision of IDCT intermediates
Fixes #299 (closed) for the SSSE3 and AVX2 asm at a slight performance hit:
AVX2 4x4 8x8 16x16 32x32 64x64
Before: 41.0 103.8 221.7 1001.3 3031.2
After: 45.8 108.6 251.7 1097.0 3293.4
Edited by Henrik Gramner