arm64: msac: Add handwritten versions of msac_decode_bool functions
GCC Cortex A53 A72 A73
msac_decode_bool_c: 29.9 17.9 23.2
msac_decode_bool_neon: 28.4 16.0 20.8
msac_decode_bool_adapt_c: 50.1 27.5 32.2
msac_decode_bool_adapt_neon: 39.2 21.4 29.2
msac_decode_bool_equi_c: 26.6 16.9 19.4
msac_decode_bool_equi_neon: 24.9 12.5 15.4
Clang Cortex A53 A72 A73
msac_decode_bool_c: 28.0 16.6 23.1
msac_decode_bool_neon: 27.9 16.0 20.9
msac_decode_bool_adapt_c: 48.4 26.5 31.9
msac_decode_bool_adapt_neon: 38.9 20.4 29.2
msac_decode_bool_equi_c: 23.7 13.4 18.8
msac_decode_bool_equi_neon: 23.7 12.5 14.9
This is as fast as, or faster than, what either GCC or Clang produces.