https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116125
--- Comment #5 from Richard Sandiford <rsandifo at gcc dot gnu.org> --- (In reply to Richard Biener from comment #3) > We document > > class dr_with_seg_len > { > ... > /* The minimum common alignment of DR's start address, SEG_LEN and > ACCESS_SIZE. */ > unsigned int align; > > but here we have access_size == 1 and align == 4. It's also said > > /* All addresses involved are known to have a common alignment ALIGN. > We can therefore subtract ALIGN from an exclusive endpoint to get > an inclusive endpoint. In the best (and common) case, ALIGN is the > same as the access sizes of both DRs, and so subtracting ALIGN > cancels out the addition of an access size. */ > unsigned int align = MIN (dr_a.align, dr_b.align); > poly_uint64 last_chunk_a = dr_a.access_size - align; > poly_uint64 last_chunk_b = dr_b.access_size - align; > > and > > We also know > that last_chunk_b <= |step|; this is checked elsewhere if it isn't > guaranteed at compile time. > > step == 4, but last_chunk_a/b are -3U. I couldn't find the "elsewhere" > to check what we validate there. The assumption that access_size is a multiple of align is crucial, so like you say, it all falls apart if that doesn't hold. In this case, that means that last_chunk_* should never have been negative. But I agree that the “elsewhere” doesn't seem to exist after all. That is, the step can be arbitrarily smaller than the access size. Somewhat relatedly, we seem to vectorise: struct s { int x; } __attribute__((packed)); void f (char *xc, char *yc, int z) { for (int i = 0; i < 100; ++i) { struct s *x = (struct s *) xc; struct s *y = (struct s *) yc; x->x += y->x; xc += z; yc += z; } } on aarch64 even with -mstrict-align -fno-vect-cost-model, generating elementwise accesses that assume that the ints are aligned. E.g.: _71 = (char *) ivtmp.19_21; _30 = ivtmp.29_94 - _26; _60 = (char *) _30; _52 = __MEM <int> ((int *)_71); _53 = (char *) ivtmp.25_18; _54 = __MEM <int> ((int *)_53); _55 = (char *) ivtmp.26_16; _56 = __MEM <int> ((int *)_55); _57 = (char *) ivtmp.27_88; _58 = __MEM <int> ((int *)_57); _59 = _Literal (int [[gnu::vector_size(16)]]) {_52, _54, _56, _58}; But the vector loop is executed even for a step of 1 (byte), provided that x and y don't overlap. > I think the case of align > access_size can easily happen with grouped > accesses with a gap at the end (see vect_vfa_access_size), so simply > failing the address-based check for this case is too pessimistic. Yeah. Something similar happened in PR115192, and I think this PR shows that the fix for PR115192 was misplaced. We should fix the overly large alignment at source, to meet the dr_with_seg_len precondition you quoted above.