15 Regression] Does not fully checking for overlapping memory regions with the vectorizer

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 07 Mar 2025 00:46:25 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116125


--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Sandiford from comment #5)
> (In reply to Richard Biener from comment #3)
> > We document
> > 
> > class dr_with_seg_len
> > {  
> > ...
> >   /* The minimum common alignment of DR's start address, SEG_LEN and
> >      ACCESS_SIZE.  */
> >   unsigned int align;
> > 
> > but here we have access_size == 1 and align == 4.  It's also said
> > 
> >   /* All addresses involved are known to have a common alignment ALIGN.
> >      We can therefore subtract ALIGN from an exclusive endpoint to get
> >      an inclusive endpoint.  In the best (and common) case, ALIGN is the
> >      same as the access sizes of both DRs, and so subtracting ALIGN
> >      cancels out the addition of an access size.  */
> >   unsigned int align = MIN (dr_a.align, dr_b.align);
> >   poly_uint64 last_chunk_a = dr_a.access_size - align;
> >   poly_uint64 last_chunk_b = dr_b.access_size - align;
> > 
> > and
> > 
> >                                                          We also know
> >      that last_chunk_b <= |step|; this is checked elsewhere if it isn't
> >      guaranteed at compile time.
> > 
> > step == 4, but last_chunk_a/b are -3U.  I couldn't find the "elsewhere"
> > to check what we validate there.
> The assumption that access_size is a multiple of align is crucial, so like
> you say, it all falls apart if that doesn't hold.  In this case, that means
> that last_chunk_* should never have been negative.
> 
> But I agree that the “elsewhere” doesn't seem to exist after all.  That is,
> the step can be arbitrarily smaller than the access size.  Somewhat
> relatedly, we seem to vectorise:
> 
> struct s { int x; } __attribute__((packed));
> 
> void f (char *xc, char *yc, int z)
> {
>   for (int i = 0; i < 100; ++i)
>     {
>       struct s *x = (struct s *) xc;
>       struct s *y = (struct s *) yc;
>       x->x += y->x;
>       xc += z;
>       yc += z;
>     }
> }
> 
> on aarch64 even with -mstrict-align -fno-vect-cost-model, generating
> elementwise accesses that assume that the ints are aligned.  E.g.:
> 
>  _71 = (char *) ivtmp.19_21;
>   _30 = ivtmp.29_94 - _26;
>   _60 = (char *) _30;
>   _52 = __MEM <int> ((int *)_71);
>   _53 = (char *) ivtmp.25_18;
>   _54 = __MEM <int> ((int *)_53);
>   _55 = (char *) ivtmp.26_16;
>   _56 = __MEM <int> ((int *)_55);
>   _57 = (char *) ivtmp.27_88;
>   _58 = __MEM <int> ((int *)_57);
>   _59 = _Literal (int [[gnu::vector_size(16)]]) {_52, _54, _56, _58};
> 
> But the vector loop is executed even for a step of 1 (byte), provided that x
> and y don't overlap.

I think this is due a similar issue to what you noticed wrt dr_aligned and
how we emit aligned loads then instead of checking the byte alignment vs.
the access size we emit - I think we don't consider misaligned elements
at all when code generating element accesses.  We do see

t.c:5:21: note:   vect_compute_data_ref_alignment:
t.c:5:21: missed:   step doesn't divide the vector alignment.
t.c:5:21: missed:   Unknown alignment for access: MEM[(struct s *)xc_21].x
t.c:5:21: note:   vect_compute_data_ref_alignment:
t.c:5:21: missed:   step doesn't divide the vector alignment.
t.c:5:21: missed:   Unknown alignment for access: MEM[(struct s *)yc_22].x
t.c:5:21: note:   vect_compute_data_ref_alignment:
t.c:5:21: missed:   step doesn't divide the vector alignment.
t.c:5:21: missed:   Unknown alignment for access: MEM[(struct s *)xc_21].x

I'm splitting this out to another PR.

[Bug tree-optimization/116125] [12/13/14/15 Regression] Does not fully checking for overlapping memory regions with the vectorizer

Reply via email to