https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116855

--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> ---
On Sun, 6 Oct 2024, fxue at os dot amperecomputing.com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116855
> 
> --- Comment #5 from Feng Xue <fxue at os dot amperecomputing.com> ---
> (In reply to Tamar Christina from comment #4)
> > (In reply to Richard Biener from comment #3)
> > > I would suggest to add a STMT_VINFO_ENSURE_NOTRAP or so and delay actual
> > > verification to vectorizable_load when both vector type and VF are fixed.
> > > We'd then likely need a LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P set
> > > conservatively the other way around from
> > > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.
> > > Alignment peeling could then peel if STMT_VINFO_ENSURE_NOTRAP and the 
> > > target
> > > cannot do full loop masking.
> > 
> > 
> > Yeah the original reported testcase is fine as the alignment makes it safe.
> > For the manually misaligned case that Andrew posted it makes sense to delay
> > and re-evaluate later on.
> > 
> > I don't think we should bother peeling though, I don't think they're that
> > common and alignment peeling breaks some dominators and exposes some
> > existing vectorizer bugs, which is being fixed in Alex's patch.
> > 
> > So at least alignment peeling I'll defer to a later point and instead just
> > reject loops that are loading from structures the user misaligned wrt to the
> > vector mode.
> > 
> > 
> > So mine..
> 
> Actually, what I wish is that we could allow vectorization on early break case
> for arbitrary address pointer (knowing nothing about alignment and bound) 
> based
> on some sort of assumption specified via command option under -Ofast, as the
> mentioned example:

I'd rather not have more command-line options gating "unsafe" transforms
but instead have source-level control per loop via pragma.  It should
probably specify a length like simdlen, specifying that accessing
[start + n * accesslen, start + (n+1)*accesslen - 1] is OK when
the scalar loop accesses [start + n * accesslen] or so.

> char * find(char *string, size_t n, char c)
> {
>     for (size_t i = 0; i < n; i++) {
>         if (string[i] == c)
>             return &string[i];
>     }
>     return 0;
> }
> 
> and example for which there is no way to do peeling to align more than one
> address pointers:
> 
> int compare(char *string1, char *string2, size_t n)
> {
>     for (size_t i = 0; i < n; i++) {
>         if (string1[i] != string2[i])
>             return string1[i] - string2[i];
>     }
>     return 0;
> }

Reply via email to