Issue |
149457
|
Summary |
WebAssembly: Suboptimal "promotion" of wasm_f32x4_convert_i32x4 into f32x4.convert_i32x4_u
|
Labels |
new issue
|
Assignees |
|
Reporter |
zeux
|
When wasm_f32x4_convert_i32x4 intrinsic gets its input from an instruction that clears top bits, the conversion gets compiled into i32x4_u instead of i32x4_s variant; for example:
```c++
v128_t plsno(v128_t x)
{
// u32x4 here changes the convert instruction; it's a problem because u32->f32 is way slower on pre-AVX512 HW
x = wasm_u32x4_shr(x, 1);
return wasm_f32x4_convert_i32x4(x);
}
```
```
local.get 0
i32.const 1
i32x4.shr_u
f32x4.convert_i32x4_u
end_function
```
This is a problem because on x64 hardware, `convert_i32x4_u` gets lowered into a long multi instruction sequence unless the browser implements AVX512 code path and the hardware supports it. Thus this needlessly slows down efficient SIMD kernels.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs