x86-64: Add msac_decode_bool_equi asm
Around 50% faster then the C code, but it varies a bit depending on which compiler is used.
C perf:
1.32% dav1d libdav1d.so.1.1.0 [.] dav1d_msac_decode_bool_equi_c
1.33% dav1d libdav1d.so.1.1.0 [.] dav1d_msac_decode_symbol_adapt4_sse2.renorm3
asm perf:
0.45% dav1d libdav1d.so.1.1.0 [.] dav1d_msac_decode_bool_equi_sse2
1.73% dav1d libdav1d.so.1.1.0 [.] dav1d_msac_decode_symbol_adapt4_sse2.renorm3
e.g. it goes from 1.32% of overall runtime to 0.85%.