On 30/10/2025 20:10, Robin Dapp wrote:
I was hoping that this patch might help me fix pr121393 (on which I'm still
stuck), if I just disabled the unsigned gather/scatter patterns, but no such
luck: it just zero-extends the supposed-to-be-negative offset and suffers
exactly the same bug.
On the plus side, whereas previously if I disabled unsigned offsets I
got a total failure to vectorize, whereas with your patch it now
vectorizes just fine, albeit with the bug.
Once things have settled down here, I can also have a look at the PR but
so far I didn't have much time.
I had a patch that worked (by forcing the offset type to signed when the
step is negative), but Richi would prefer that the base address is
relocated so that the offsets become positive. Other loads can do this,
but not strided gather/scatter. In theory it can be made to work the
same way, but the code is new to me and non-trivial.
Assuming your patch is approved, should I continue to accept signed
offsets in the backend? The hardware instruction that accepts 32-bit
offsets only accepts unsigned offsets, so I'm supporting signed offsets
by calculating the absolute address explicitly. Of course, the backend
has no way to know if the range of possible offsets is actually safe to
treat it as unsigned, so is your way likely to produce better code?
We also accept "everything" in the riscv backend and convert the offsets
to unsigned. Similar to yours our instructions can only do unsigned offsets.
I wouldn't expect this patch to improve on what a backend can do. For us the
motivation is that costing will be easier, as well as choosing the offset
type.
I'm not doing any range checks, though. The idea is that for signed ->
unsigned we need a new pointer-sized offset and things will just work via
two's complement. The range truncation for strided gather/scatter is not
touched.
OK, there's no advantage in touching anything in the backend then, at
least for now.
The backend also accepts 64-bit offsets, but there's no hardware
instruction for that, so both variants are converted to absolute
addresses explicitly. This variant only exists because I found that the
SPEC HPC "lbm" benchmark fails to vectorize at all without it, but it
would be better if middle end didn't choose to use 64-bit offsets unless
absolutely necessary, but vectorizing is better than not.
Hmm, we try to use smaller offset types first, generally.
If a 64-bit offset is used that's likely due to a double array being used just
like in SPECfp 2017's lbm? How do you treat 64-bit offsets when the hardware
can only handle 32-bits? Maybe we could use ranger range information
in vect_truncate_gather_scatter_offset?
Simplifying a bit, we have two gather/scatter instruction formats. One
takes a 64-bit base address and a vector of 32-bit offsets, and the
other takes a vector of 64-bit absolute addresses (optionally with a
12-bit scalar offset).
Thanks
Andrew