I think it might be possible that refactoring how we do VMAT_STRIDED_SLP
vs VMAT_GATHER/SCATTER, at least and possibly specifically for the
case of emulated handling would be a good thing. But it'll require experiments
and see how it all fits together.
I started experimenting some days ago and hit a bit of a roadblock when
checking whether
uint8_t *pix1;
int i_pix1;
uint8_t *pix2;
int i_pix2;
for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 )
...
is suitable for a gather. IMHO the offset type must match the step/stride =
pix1 type = int. As we do not support signed gather offsets we need to use
64-bit unsigned ones and a 16 * 64 vector isn't directly supported.
So even though our strided loads do support signed strides, we cannot go via
the
recognize gather
-> recognize strided offset
-> strided load
route because the initial signed-offset gather will be unsupported :/
I guess we could define gather_load patterns with signed index but we'd have no
way of verifying whether a dynamic offset is actually a stride...
For riscv we could continue to pretend to have the respective gather (via LMUL,
so using > 1 registers) and later hope that what we recognized is indeed a
strided offset.
The other option would be to have an alternative to the gather -> strided
connection and additionally check for strided_load support directly? Currently
I see no way around this given the restrictions imposed by our insns.
--
Regards
Robin