I think it might be possible that refactoring how we do VMAT_STRIDED_SLP
vs VMAT_GATHER/SCATTER, at least and possibly specifically for the
case of emulated handling would be a good thing.  But it'll require experiments
and see how it all fits together.

I started experimenting some days ago and hit a bit of a roadblock when checking whether

   uint8_t *pix1;
   int i_pix1;
   uint8_t *pix2;
   int i_pix2;
   for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 )
     ...

is suitable for a gather. IMHO the offset type must match the step/stride = pix1 type = int. As we do not support signed gather offsets we need to use
64-bit unsigned ones and a 16 * 64 vector isn't directly supported.

So even though our strided loads do support signed strides, we cannot go via the
  recognize gather
-> recognize strided offset
-> strided load
route because the initial signed-offset gather will be unsupported :/

I guess we could define gather_load patterns with signed index but we'd have no way of verifying whether a dynamic offset is actually a stride...

For riscv we could continue to pretend to have the respective gather (via LMUL, so using > 1 registers) and later hope that what we recognized is indeed a strided offset.

The other option would be to have an alternative to the gather -> strided connection and additionally check for strided_load support directly? Currently I see no way around this given the restrictions imposed by our insns.

--
Regards
Robin

Reply via email to