https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116125
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Sandiford from comment #5)
> (In reply to Richard Biener from comment #3)
> > We document
> >
> > class dr_with_seg_len
> > {
> > ...
> > /* The minimum common alignment of DR's start address, SEG_LEN and
> > ACCESS_SIZE. */
> > unsigned int align;
> >
> > but here we have access_size == 1 and align == 4. It's also said
> >
> > /* All addresses involved are known to have a common alignment ALIGN.
> > We can therefore subtract ALIGN from an exclusive endpoint to get
> > an inclusive endpoint. In the best (and common) case, ALIGN is the
> > same as the access sizes of both DRs, and so subtracting ALIGN
> > cancels out the addition of an access size. */
> > unsigned int align = MIN (dr_a.align, dr_b.align);
> > poly_uint64 last_chunk_a = dr_a.access_size - align;
> > poly_uint64 last_chunk_b = dr_b.access_size - align;
> >
> > and
> >
> > We also know
> > that last_chunk_b <= |step|; this is checked elsewhere if it isn't
> > guaranteed at compile time.
> >
> > step == 4, but last_chunk_a/b are -3U. I couldn't find the "elsewhere"
> > to check what we validate there.
> The assumption that access_size is a multiple of align is crucial, so like
> you say, it all falls apart if that doesn't hold. In this case, that means
> that last_chunk_* should never have been negative.
>
> But I agree that the “elsewhere” doesn't seem to exist after all. That is,
> the step can be arbitrarily smaller than the access size. Somewhat
> relatedly, we seem to vectorise:
>
> struct s { int x; } __attribute__((packed));
>
> void f (char *xc, char *yc, int z)
> {
> for (int i = 0; i < 100; ++i)
> {
> struct s *x = (struct s *) xc;
> struct s *y = (struct s *) yc;
> x->x += y->x;
> xc += z;
> yc += z;
> }
> }
>
> on aarch64 even with -mstrict-align -fno-vect-cost-model, generating
> elementwise accesses that assume that the ints are aligned. E.g.:
>
> _71 = (char *) ivtmp.19_21;
> _30 = ivtmp.29_94 - _26;
> _60 = (char *) _30;
> _52 = __MEM <int> ((int *)_71);
> _53 = (char *) ivtmp.25_18;
> _54 = __MEM <int> ((int *)_53);
> _55 = (char *) ivtmp.26_16;
> _56 = __MEM <int> ((int *)_55);
> _57 = (char *) ivtmp.27_88;
> _58 = __MEM <int> ((int *)_57);
> _59 = _Literal (int [[gnu::vector_size(16)]]) {_52, _54, _56, _58};
>
> But the vector loop is executed even for a step of 1 (byte), provided that x
> and y don't overlap.
I think this is due a similar issue to what you noticed wrt dr_aligned and
how we emit aligned loads then instead of checking the byte alignment vs.
the access size we emit - I think we don't consider misaligned elements
at all when code generating element accesses. We do see
t.c:5:21: note: vect_compute_data_ref_alignment:
t.c:5:21: missed: step doesn't divide the vector alignment.
t.c:5:21: missed: Unknown alignment for access: MEM[(struct s *)xc_21].x
t.c:5:21: note: vect_compute_data_ref_alignment:
t.c:5:21: missed: step doesn't divide the vector alignment.
t.c:5:21: missed: Unknown alignment for access: MEM[(struct s *)yc_22].x
t.c:5:21: note: vect_compute_data_ref_alignment:
t.c:5:21: missed: step doesn't divide the vector alignment.
t.c:5:21: missed: Unknown alignment for access: MEM[(struct s *)xc_21].x
I'm splitting this out to another PR.