> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Thursday, March 6, 2025 10:40 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org
> Subject: Re: [1/3 PATCH]AArch64: add support for partial modes to last
> extractions [PR118464]
> 
> Tamar Christina <tamar.christ...@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandif...@arm.com>
> >> Sent: Wednesday, March 5, 2025 11:27 AM
> >> To: Tamar Christina <tamar.christ...@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
> >> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org
> >> Subject: Re: [1/3 PATCH]AArch64: add support for partial modes to last
> >> extractions [PR118464]
> >>
> >> Tamar Christina <tamar.christ...@arm.com> writes:
> >> >> > diff --git a/gcc/config/aarch64/aarch64-sve.md
> >> b/gcc/config/aarch64/aarch64-
> >> >> sve.md
> >> >> > index
> >> >>
> >>
> e975286a01904bec0b283b7ba4afde6f0fd60bf1..6c0be3c1a51449274720175b
> >> >> 5e6e7d7535928de6 100644
> >> >> > --- a/gcc/config/aarch64/aarch64-sve.md
> >> >> > +++ b/gcc/config/aarch64/aarch64-sve.md
> >> >> > @@ -3107,7 +3107,7 @@ (define_insn "@extract_<last_op>_<mode>"
> >> >> >    [(set (match_operand:<VEL> 0 "register_operand")
> >> >> >       (unspec:<VEL>
> >> >> >         [(match_operand:<VPRED> 1 "register_operand")
> >> >> > -        (match_operand:SVE_FULL 2 "register_operand")]
> >> >> > +        (match_operand:SVE_ALL 2 "register_operand")]
> >> >> >         LAST))]
> >> >> >    "TARGET_SVE"
> >> >> >    {@ [ cons: =0 , 1   , 2  ]
> >> >>
> >> >> It looks like this will use (say):
> >> >>
> >> >>   lasta b<n>, pg, z<m>.b
> >> >>
> >> >> for VNx4QI, is that right?  I don't think that's safe, since the .b form
> >> >> treats all bits of the pg input as significant, whereas only one in 
> >> >> every
> >> >> four bits of pg is defined for VNx4BI (the predicate associated with 
> >> >> VNx4QI).
> >> >>
> >> >> I think converting these patterns to partial vectors means operating
> >> >> on containers rather than elements.  E.g. the VNx4QI case should use
> >> >> .s rather than .b.  That should just be a case of changing vwcore to
> >> >> vccore and Vetype to Vctype, but I haven't looked too deeply.
> >> >
> >> > Ah I see, so for partial types, the values are not expected to be packed 
> >> > in the
> >> lower
> >> > part of the vector, but instead are "padded"?
> >>
> >> Right.
> >>
> >> > That explains some of the other patterns
> >> > I was confused about.
> >> >
> >> > Any ideas how to test these? It's hard to control what modes the 
> >> > vectorizer
> >> picks..
> >>
> >> Yeah, agreed.  I think it'd be difficult to trigger it reliably from the
> >> vectoriser given its current limited use of the ifns.
> >>
> >> A gimple frontend test might work though, with a predicate/mask
> >> generated from (say) 16-bit elements, then bitcast to a predicate/mask
> >> for 32-bit elements and used as an input to an explicit ifn on 32-bit
> >> elements.  If the 16-bit predicate contains 0, 1 for some even-aligned
> >> pair, after the last 1, 0 aligned pair, then the code would likely have
> >> picked the wrong element.
> >
> > *insert chaos emoji* I realized the testcases in the testsuite early break
> > vect tests needed this since that's how I noticed it anyway.. so it's enough
> > to run that with SVE enabled (as there are tests depending on partial 
> > vectors
> > anyway)
> >
> > So...
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >
> >     PR tree-optimization/118464
> >     PR tree-optimization/116855
> >     * config/aarch64/aarch64-sve.md (@extract_<last_op>_<mode>,
> >     @fold_extract_<last_op>_<mode>,
> >     @aarch64_fold_extract_vector_<last_op>_<mode>): Change SVE_FULL to
> >     SVE_ALL/
> >
> > -- inline copy of patch --
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-
> sve.md
> > index
> a93bc463a909ea28460cc7877275fce16e05f7e6..205eeec2e35544de848e0dbb
> 48e3f5ae59391a88 100644
> > --- a/gcc/config/aarch64/aarch64-sve.md
> > +++ b/gcc/config/aarch64/aarch64-sve.md
> > @@ -3107,12 +3107,12 @@ (define_insn "@extract_<last_op>_<mode>"
> >    [(set (match_operand:<VEL> 0 "register_operand")
> >     (unspec:<VEL>
> >       [(match_operand:<VPRED> 1 "register_operand")
> > -      (match_operand:SVE_FULL 2 "register_operand")]
> > +      (match_operand:SVE_ALL 2 "register_operand")]
> >       LAST))]
> >    "TARGET_SVE"
> >    {@ [ cons: =0 , 1   , 2  ]
> > -     [ ?r       , Upl , w  ] last<ab>\t%<vwcore>0, %1, %2.<Vetype>
> > -     [ w        , Upl , w  ] last<ab>\t%<Vetype>0, %1, %2.<Vetype>
> > +     [ ?r       , Upl , w  ] last<ab>\t%<vwcore>0, %1, %2.<Vctype>
> 
> This looks good for the Vetype->Vctype part, but it still needs the
> vwcore->vccore change too.  If we don't do that, we could end up pairing
> a w destination with a .d source, which would trigger an assembler error.
> 

The sad part is, I went and checked the instruction before posting this and 
missed
that that wasn't allowed.. arggg..

Tamar

> Thanks,
> Richard
> 
> 
> > +     [ w        , Upl , w  ] last<ab>\t%<Vctype>0, %1, %2.<Vctype>
> >    }
> >  )
> >
> > @@ -8899,26 +8899,26 @@ (define_insn "@fold_extract_<last_op>_<mode>"
> >     (unspec:<VEL>
> >       [(match_operand:<VEL> 1 "register_operand")
> >        (match_operand:<VPRED> 2 "register_operand")
> > -      (match_operand:SVE_FULL 3 "register_operand")]
> > +      (match_operand:SVE_ALL 3 "register_operand")]
> >       CLAST))]
> >    "TARGET_SVE"
> >    {@ [ cons: =0 , 1 , 2   , 3  ]
> > -     [ ?r       , 0 , Upl , w  ] clast<ab>\t%<vwcore>0, %2, %<vwcore>0, 
> > %3.<Vetype>
> > -     [ w        , 0 , Upl , w  ] clast<ab>\t%<Vetype>0, %2, %<Vetype>0, 
> > %3.<Vetype>
> > +     [ ?r       , 0 , Upl , w  ] clast<ab>\t%<vwcore>0, %2, %<vwcore>0, 
> > %3.<Vctype>
> > +     [ w        , 0 , Upl , w  ] clast<ab>\t%<Vctype>0, %2, %<Vctype>0, 
> > %3.<Vctype>
> >    }
> >  )
> >
> >  (define_insn "@aarch64_fold_extract_vector_<last_op>_<mode>"
> > -  [(set (match_operand:SVE_FULL 0 "register_operand")
> > -   (unspec:SVE_FULL
> > -     [(match_operand:SVE_FULL 1 "register_operand")
> > +  [(set (match_operand:SVE_ALL 0 "register_operand")
> > +   (unspec:SVE_ALL
> > +     [(match_operand:SVE_ALL 1 "register_operand")
> >        (match_operand:<VPRED> 2 "register_operand")
> > -      (match_operand:SVE_FULL 3 "register_operand")]
> > +      (match_operand:SVE_ALL 3 "register_operand")]
> >       CLAST))]
> >    "TARGET_SVE"
> >    {@ [ cons: =0 , 1 , 2   , 3  ]
> > -     [ w        , 0 , Upl , w  ] clast<ab>\t%0.<Vetype>, %2, %0.<Vetype>,
> %3.<Vetype>
> > -     [ ?&w      , w , Upl , w  ] movprfx\t%0, %1\;clast<ab>\t%0.<Vetype>, 
> > %2,
> %0.<Vetype>, %3.<Vetype>
> > +     [ w        , 0 , Upl , w  ] clast<ab>\t%0.<Vctype>, %2, %0.<Vctype>,
> %3.<Vctype>
> > +     [ ?&w      , w , Upl , w  ] movprfx\t%0, %1\;clast<ab>\t%0.<Vctype>, 
> > %2,
> %0.<Vctype>, %3.<Vctype>
> >    }
> >  )

Reply via email to