> -----Original Message----- > From: Richard Biener <[email protected]> > Sent: 22 October 2025 09:53 > To: Kugan Vivekanandarajah <[email protected]> > Cc: [email protected]; Tamar Christina <[email protected]> > Subject: Re: [PATCH][tree-optimization/61338] - Optimize redundant reverse > permutations in vectorized stores > > On Tue, Oct 21, 2025 at 11:57 PM Kugan Vivekanandarajah > <[email protected]> wrote: > > > > Hi Richard, > > > > Thanks for the review. > > > > > On 15 Oct 2025, at 10:39 pm, Richard Biener > <[email protected]> wrote: > > > > > > External email: Use caution opening links or attachments > > > > > > > > > On Wed, Oct 15, 2025 at 12:08 AM Kugan Vivekanandarajah > > > <[email protected]> wrote: > > >> > > >> Hi, > > >> > > >> This patch eliminates redundant reverse permutations in vectorized > reverse > > >> loops by detecting and optimizing patterns during store vectorization. > > >> > > >> The reverse load (b[i]) generates PERM, operations are applied, then the > > >> reverse store adds another PERM. This creates redundant permute pairs > that > > >> we now detect and eliminate. > > >> > > >> With the patch, for the example loop > > >> for (int i = N - 1; i >= 0; i--) > > >> { > > >> a[i] = b[i] + 1.0f; > > >> } > > >> Changes to the following > > >> - ldr q29, [x0, x2] > > >> - tbl v29.16b, {v29.16b}, v31.16b > > >> - fadd v29.4s, v29.4s, v30.4s > > >> - tbl v29.16b, {v29.16b}, v31.16b > > >> - str q29, [x3, x2] > > >> + ldr q30, [x0, x2] > > >> + fadd v30.4s, v30.4s, v31.4s > > >> + str q30, [x3, x2] > > > > > > So this works basically as a post-processing optimization at the time > > > we generate the > > > vector store. While that's in principle an OK optimization I'd rather > > > have such post-processing > > > implemented outside of the vectorizer because then also permutes not > > > originating from > > > vectorizer permuted store generation would benefit. > > > > > > As for implementing this in the vectorizer itself the more appropriate > > > thing would be > > > to expose these permutes to the permute optimization phase, because > then it can > > > be also taken into account during costing and a reverse load permute > > > could be elided > > > if it feeds an associatable reduction. > > > > > > There is, unfortunately, currently no good way to represent how we > implement > > > negative strided contiguous accesses with load permutations as the > peculiarity > > > only exposes itself after applying the VF and load/lane permutations are > > > represented on the VF == 1 SLP graph. One of my ideas what that once we > > > settle on VF (and possibly vector types) we want to expand the SLP graph > > > to cover all lanes of the vector loop so we can expose actual permutes and > > > vector granularity. This is a bit far off though. > > > > > > So in line with your patch but more appropriate for in-vectorizer > > > operation would > > > be an analysis on the SLP graph that simply marks reverse permutes that > can > > > be elided (for the back-to-back case). This way both costing and code > > > generation > > > can take this into account and you wouldn't have to adjust any stmts. > > > > I have now changed it to account for the costing. Bootstrapped and > regression tested on aarch64-linux-gnu. > > > > Is this OK? > > Same here? >
Did you send the right version of the patch Kugan? It's identical to the one you sent before and also has some changes in gcc/fortran/resolve.cc not specified and your changelog seems to have an incorrect format, the files containing what you changed aren't mentioned. Thanks, Tamar > > Thanks, > > Kugan > > > > > > > > > > Thanks, > > > Richard. > > > > > >> PR tree-optimization/61338 > > >> > > >> gcc/ChangeLog: > > >> (get_vector_perm_operand): New. > > >> (vect_find_reverse_permute_operand): New helper function > > >> to find reverse permutations through element-wise operation > > >> chains. > > >> Returns true only if ALL operands have reverse permutations. > > >> (vectorizable_store): Use recursive helper to eliminate redundant > > >> reverse permutations with configurable search depth. > > >> > > >> gcc/testsuite/ChangeLog: > > >> > > >> * gcc.dg/vect/slp-permute-reverse-1.c: New test for basic > > >> reverse permute optimization (simple copy). > > >> * gcc.dg/vect/slp-permute-reverse-2.c: New runtime test for > > >> basic pattern. > > >> Signed-off-by: Kugan Vivekanandarajah <[email protected]> > > >> > > >> Bootstrapped and regression tested on aarch64-linux-gcc. Is this OK? > > >> > > >> Thanks, > > >> Kugan > > > >
