Re: [PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-23 Thread Andi Kleen
> > I think for a 512-bit vector, vgf2p8affineqb is better than the > original codegen, but for a 128/256-bit vector, shouldn't vpcmpgtb be > better than vgf2p8affineqb? Yes it's better, but I don't see it in the loop bodies for any of my test cases, only in prologues/epilogues. Okay probably t

Re: [PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-23 Thread Hongtao Liu
On Fri, Aug 22, 2025 at 11:26 PM Andi Kleen wrote: > > > > + else if (TARGET_GFNI && TARGET_AVX512F && CONST_INT_P (operands[2])) > > I don't think we need AVX512F here, and let's exclude >>7 cases here, > > so better be. > > else if (TARGET_GFNI > > && CONST_INT_P (operands[2]) > >

Re: [PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-22 Thread Andi Kleen
> > + else if (TARGET_GFNI && TARGET_AVX512F && CONST_INT_P (operands[2])) > I don't think we need AVX512F here, and let's exclude >>7 cases here, > so better be. > else if (TARGET_GFNI > && CONST_INT_P (operands[2]) > /* It's just vpcmpgtb against 0. */ > && !

Re: [PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-21 Thread Hongtao Liu
On Wed, Aug 20, 2025 at 11:08 PM Andi Kleen wrote: > > From: Andi Kleen > > [v2 version: Split rotate patterns in V16QI and V32/64QI. > Add various AVX512F checks. Remove some unnecessary > masks. Add untested cond_ pattern (untested, couldn't trigger it) > Clean up some control flow. Use narrowe

[PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-20 Thread Andi Kleen
From: Andi Kleen [v2 version: Split rotate patterns in V16QI and V32/64QI. Add various AVX512F checks. Remove some unnecessary masks. Add untested cond_ pattern (untested, couldn't trigger it) Clean up some control flow. Use narrower modes. Avoid need for weakening predicate check in expand. Use