15 Regression] Does not fully checking for overlapping memory regions with the vectorizer

rsandifo at gcc dot gnu.org via Gcc-bugs Thu, 06 Mar 2025 08:58:12 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116125


--- Comment #5 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> We document
> 
> class dr_with_seg_len
> {  
> ...
>   /* The minimum common alignment of DR's start address, SEG_LEN and
>      ACCESS_SIZE.  */
>   unsigned int align;
> 
> but here we have access_size == 1 and align == 4.  It's also said
> 
>   /* All addresses involved are known to have a common alignment ALIGN.
>      We can therefore subtract ALIGN from an exclusive endpoint to get
>      an inclusive endpoint.  In the best (and common) case, ALIGN is the
>      same as the access sizes of both DRs, and so subtracting ALIGN
>      cancels out the addition of an access size.  */
>   unsigned int align = MIN (dr_a.align, dr_b.align);
>   poly_uint64 last_chunk_a = dr_a.access_size - align;
>   poly_uint64 last_chunk_b = dr_b.access_size - align;
> 
> and
> 
>                                                          We also know
>      that last_chunk_b <= |step|; this is checked elsewhere if it isn't
>      guaranteed at compile time.
> 
> step == 4, but last_chunk_a/b are -3U.  I couldn't find the "elsewhere"
> to check what we validate there.
The assumption that access_size is a multiple of align is crucial, so like you
say, it all falls apart if that doesn't hold.  In this case, that means that
last_chunk_* should never have been negative.

But I agree that the “elsewhere” doesn't seem to exist after all.  That is, the
step can be arbitrarily smaller than the access size.  Somewhat relatedly, we
seem to vectorise:

struct s { int x; } __attribute__((packed));

void f (char *xc, char *yc, int z)
{
  for (int i = 0; i < 100; ++i)
    {
      struct s *x = (struct s *) xc;
      struct s *y = (struct s *) yc;
      x->x += y->x;
      xc += z;
      yc += z;
    }
}

on aarch64 even with -mstrict-align -fno-vect-cost-model, generating
elementwise accesses that assume that the ints are aligned.  E.g.:

 _71 = (char *) ivtmp.19_21;
  _30 = ivtmp.29_94 - _26;
  _60 = (char *) _30;
  _52 = __MEM <int> ((int *)_71);
  _53 = (char *) ivtmp.25_18;
  _54 = __MEM <int> ((int *)_53);
  _55 = (char *) ivtmp.26_16;
  _56 = __MEM <int> ((int *)_55);
  _57 = (char *) ivtmp.27_88;
  _58 = __MEM <int> ((int *)_57);
  _59 = _Literal (int [[gnu::vector_size(16)]]) {_52, _54, _56, _58};

But the vector loop is executed even for a step of 1 (byte), provided that x
and y don't overlap.

> I think the case of align > access_size can easily happen with grouped
> accesses with a gap at the end (see vect_vfa_access_size), so simply
> failing the address-based check for this case is too pessimistic.
Yeah.  Something similar happened in PR115192, and I think this PR shows that
the fix for PR115192 was misplaced.  We should fix the overly large alignment
at source, to meet the dr_with_seg_len precondition you quoted above.

[Bug tree-optimization/116125] [12/13/14/15 Regression] Does not fully checking for overlapping memory regions with the vectorizer

Reply via email to