attributes: Don't align to more than what assembly needs/benefits from
For arm/arm64, there's no need to align any buffer to 32 bytes as the assembly doesn't need it and doesn't benefit from it.
This would be much more elegant if defined like this:
#define MAX_ALIGN 16
#define ALIGN(align) __attribute__((aligned(MIN(align, MAX_ALIGN))))
This works for GCC and Clang, but the MSVC alignment __declspec needs a literal alignment value, it can't handle an expression.
@gramner brought this up the other day, that the current code probably is wasteful for arm.