Draft: riscv64/mc: Add 8bpc w_mask RVV function
The function is separated into a case for w<32 and w>=32 to lessen the penalty that using large vector groups has on performance with small widths. Otherwise it's mostly a 1:1 rewrite of the C code into RVV asm. Some parts of the function are commented but I can always remove this if it's not by the repo standards.
Benchmarks:
Kendryte K230 | SpacemiT K1 |
---|---|
|
|
Edited by Bogdan Gligorijević