arm: looprestoration: Simplify code by allowing writing up to 8 pixels past the end of rows
This corresponds to what the x86 assembly does right now (as far as I know).
This allows removing a fair bit of code, and allows marking the stores as aligned. (Previously, the writes of the narrow slice temp buffer were unaligned.)