>
> I think for a 512-bit vector, vgf2p8affineqb is better than the
> original codegen, but for a 128/256-bit vector, shouldn't vpcmpgtb be
> better than vgf2p8affineqb?
Yes it's better, but I don't see it in the loop bodies for
any of my test cases, only in prologues/epilogues.
Okay probably t
On Fri, Aug 22, 2025 at 11:26 PM Andi Kleen wrote:
>
> > > + else if (TARGET_GFNI && TARGET_AVX512F && CONST_INT_P (operands[2]))
> > I don't think we need AVX512F here, and let's exclude >>7 cases here,
> > so better be.
> > else if (TARGET_GFNI
> > && CONST_INT_P (operands[2])
> >
> > + else if (TARGET_GFNI && TARGET_AVX512F && CONST_INT_P (operands[2]))
> I don't think we need AVX512F here, and let's exclude >>7 cases here,
> so better be.
> else if (TARGET_GFNI
> && CONST_INT_P (operands[2])
> /* It's just vpcmpgtb against 0. */
> && !
On Wed, Aug 20, 2025 at 11:08 PM Andi Kleen wrote:
>
> From: Andi Kleen
>
> [v2 version: Split rotate patterns in V16QI and V32/64QI.
> Add various AVX512F checks. Remove some unnecessary
> masks. Add untested cond_ pattern (untested, couldn't trigger it)
> Clean up some control flow. Use narrowe
From: Andi Kleen
[v2 version: Split rotate patterns in V16QI and V32/64QI.
Add various AVX512F checks. Remove some unnecessary
masks. Add untested cond_ pattern (untested, couldn't trigger it)
Clean up some control flow. Use narrower modes.
Avoid need for weakening predicate check in expand.
Use