https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |testsuite-fail --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- It should only need vect32 - basically I assumed the target can compose the 64bit vector from two 32bit elements. But it might be that for this to work the loads would need to be aligned. What is needed is char-to-short unpacking and vector composition. Either composing V2SImode or V8QImode from two V4QImode vectors. Does the following help? diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c index 36463ca22c5..08942380caa 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c +++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c @@ -4,6 +4,9 @@ typedef unsigned char uint8_t; typedef short int16_t; void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t *pix2) { + diff = __builtin_assume_aligned (diff, __BIGGEST_ALIGNMENT__); + pix1 = __builtin_assume_aligned (pix1, 4); + pix2 = __builtin_assume_aligned (pix2, 4); for (int y = 0; y < 4; y++) { for (int x = 0; x < 4; x++) diff[x + y * 4] = pix1[x] - pix2[x];