https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125476

--- Comment #2 from Zhongyao Chen <chenzhongyao.hit at gmail dot com> ---
I found another testcase when I test with my local patch that scales
RVV vector body costs by a default scalar/vector unit ratio of 2, 
I see many regressions locally (69 test files).

One example is `vx-5-i64.c`. It can be reduced to:

```c
#include <stdint.h>

void
test_vx_binary_add_int64_t_case_1 (int64_t *restrict out,
                                    int64_t *restrict in,
                                    int64_t x, unsigned n)
{
unsigned k = 0;
int64_t tmp = x + 3;

while (k < n)
    {
    tmp = tmp ^ 0x3f;
    out[k + 0] = in[k + 0] + tmp;
    out[k + 1] = in[k + 1] + tmp;
    k += 2;
    }
}
```

On trunk, this is SLP-vectorized.

The vector cost is 6:

- vector_load = 2
- vector_stmt = 1
- vector_store = 1
- scalar_to_vec = 2

The scalar cost is also 6.

With trunk + my scalar/vector ratio patch, it falls back to scalar code
because the vector cost becomes 7 while the scalar cost stays 6.

That does not look right to me. I still expect it to be vectorized.

I think the main issue is the scalar_to_vec cost. It looks overcounted here.

There is no explicit `vmv.v.x` or other standalone scalar-to-vector
instruction in the final assembly.  Instead, it generates `vadd.vx v1,v1,a2`.

So SLP costing seems to charge a virtual scalar_to_vec cost
even when the final lowering can use a .vx form directly.

Without fixing this, it is hard to make further progress on the scalar/vector
ratio cost-model tuning.

Reply via email to