Issue 149457
Summary WebAssembly: Suboptimal "promotion" of wasm_f32x4_convert_i32x4 into f32x4.convert_i32x4_u
Labels new issue
Assignees
Reporter zeux
    When wasm_f32x4_convert_i32x4 intrinsic gets its input from an instruction that clears top bits, the conversion gets compiled into i32x4_u instead of i32x4_s variant; for example:

```c++
v128_t plsno(v128_t x)
{
    // u32x4 here changes the convert instruction; it's a problem because u32->f32 is way slower on pre-AVX512 HW
    x = wasm_u32x4_shr(x, 1);
    return wasm_f32x4_convert_i32x4(x);
}
```

```
        local.get       0
 i32.const       1
        i32x4.shr_u
        f32x4.convert_i32x4_u
 end_function
```

This is a problem because on x64 hardware, `convert_i32x4_u` gets lowered into a long multi instruction sequence unless the browser implements AVX512 code path and the hardware supports it. Thus this needlessly slows down efficient SIMD kernels.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to