Avoid masking the lsb in high bit-depth stride calculations
We specify most strides in bytes, but since C defines offsets in multiples of sizeof(type) we use the PXSTRIDE() macro to downshift the strides by one in high-bit depth templated files.
This however means that the compiler is required to mask away the least significant bit, because it could in theory be non-zero.
Avoid that by telling the compiler (when compiled in release mode) that the lsb is in fact guaranteed to always be zero.