https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125476
--- Comment #2 from Zhongyao Chen <chenzhongyao.hit at gmail dot com> ---
I found another testcase when I test with my local patch that scales
RVV vector body costs by a default scalar/vector unit ratio of 2,
I see many regressions locally (69 test files).
One example is `vx-5-i64.c`. It can be reduced to:
```c
#include <stdint.h>
void
test_vx_binary_add_int64_t_case_1 (int64_t *restrict out,
int64_t *restrict in,
int64_t x, unsigned n)
{
unsigned k = 0;
int64_t tmp = x + 3;
while (k < n)
{
tmp = tmp ^ 0x3f;
out[k + 0] = in[k + 0] + tmp;
out[k + 1] = in[k + 1] + tmp;
k += 2;
}
}
```
On trunk, this is SLP-vectorized.
The vector cost is 6:
- vector_load = 2
- vector_stmt = 1
- vector_store = 1
- scalar_to_vec = 2
The scalar cost is also 6.
With trunk + my scalar/vector ratio patch, it falls back to scalar code
because the vector cost becomes 7 while the scalar cost stays 6.
That does not look right to me. I still expect it to be vectorized.
I think the main issue is the scalar_to_vec cost. It looks overcounted here.
There is no explicit `vmv.v.x` or other standalone scalar-to-vector
instruction in the final assembly. Instead, it generates `vadd.vx v1,v1,a2`.
So SLP costing seems to charge a virtual scalar_to_vec cost
even when the final lowering can use a .vx form directly.
Without fixing this, it is hard to make further progress on the scalar/vector
ratio cost-model tuning.