SSE2, msac: Use bsr shortcut for 50% bool decoding
Works via magic. shift always be 0, 1, or 2 and v is never in 0b11.*. It's a pity the shift isn't always 1.
~0.2%, but I believe this on a single threaded part. I think I might be able to get it slightly faster by removing the neg, but need suggestions on the route to take.
Edited by Kyle Siefring