Improve wiener filter C implementation using loop interchange
Improve wiener filter C implementation using loop interchange. This will improve the performance in simple architectures.
Power 9 Before: wiener_chroma_8bpc_c: 57830.7 wiener_luma_8bpc_c: 75434.7 Power 9 After: wiener_chroma_8bpc_c: 48185.3 wiener_luma_8bpc_c: 47948.1
In modern x86 (Ryzen 1600) the difference is not noticeable.
Edited by Jean-Baptiste Kempf