https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120164

--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> Note with "vectorizing" prefetches I meant adjusting the prefetched address,
> "vectorizing" it as an induction but only prefetching on the first (or
> last?) address of the vector induction vector.  Aka simply advancing the
> prefetch
> address IV by VF * step and keeping the "scalar" prefetch as-is.  The
> other alternative is to handle it like we could other not vectorizable scalar
> code, duplicate it according to the unroll factor (the VF), but that's likely
> worse in practice.  For the conditional case we'd ideally do
> 
>  if (any(vector_'i' % 1024 == 0))
>    __builtin_prefetch (&(b[first_of(vector_'i')+1024]));
> 

For SVE based prefetching you'd have to use the masked based version, i.e.
https://developer.arm.com/documentation/ddi0602/2025-03/SVE-Instructions/PRFW--scalar-plus-immediate---Contiguous-prefetch-words--immediate-index--
as we wouldn't watch the branch in the codegen.

It's a bit harder than just adjusting the scalar IV as it's also supported for
gathers and scatters where you have a vector addresses in the prefetch for e.g.
https://developer.arm.com/documentation/ddi0602/2025-03/SVE-Instructions/PRFW--vector-plus-immediate---Gather-prefetch-words--vector-plus-immediate--

So this would actually need codegen support and be part of the SLP tree so it
can use the invariants created by the vectorizer for addresses and deal with
complexities such as what happens if we scalarize the gather later on.

> For x86 it's low priority, who writes prefetches usually writes vector 
> intrinsics as well.

Agreed.

The given example is an easy one to drop, but I wonder what would happen if the
block had other instructions too

void foo(double * restrict a, double * restrict b, int n){
  int i;
  for(i=0; i<n; ++i){
    if (i % 1024 == 0)
      {
        __builtin_prefetch(&(b[i+1024]));
        a[i] = a[i] + b[i];
      }
  }
}

would block if-conversion if we don't drop it early enough.

Reply via email to