On Thu, 23 Oct 2025, Richard Sandiford wrote: > Richard Biener <[email protected]> writes: > > On Wed, 22 Oct 2025, Richard Sandiford wrote: > > > >> Richard Biener <[email protected]> writes: > >> > The following tries to addess one shortcoming of the SLP permute > >> > optimization phase which is that it tries to refuse layouts that > >> > cannot revert to layout 0 on an edge even if layout 0 isn't actually > >> > valid. This hurts in particular on VLA vector targets where many > >> > permutes cannot be code generated. While for fixed-length targets > >> > costs eventually sort things out in a way to chose a "no permutation" > >> > setting the above check prevents VLA targets from arriving there. > >> > >> I'm not sure this is the right way to go. The subpass is supposed > >> to be an optimisation pass rather than a legitimisation/"make something > >> vectorisable" pass. The cost function is purely aimed at reducing > >> permutations rather than at turning unvectorisable graphs into > >> vectorisation ones. If the pass happens to make something vectorisable, > >> that happens by chance rather than by design. > > > > Note while the change turns something not vectorizable into something > > vectorizable it does so by enabling permute optimization to see that > > a load permutation can be fully elided. This currently does not work > > because of the imposed "the reverse permute has to be code generatable" > > check(s). > > Yeah. But the purpose of the pass is to optimise permutations in > cases where the original form is vectorisable, rather than to search > for a vectorisable form. > > > The existing handling of [in_layout_i == 0 &&] out_layout_i == 0 for > > a not code generatable permute made me think that the fact that > > layout 0 isn't code generatable shouldn't prevent us from arriving > > at the optimal (just from a cost perspective) layout configuration. > > > > To this extent, do you agree with the proposed change to > > internal_node_cost to anticipate redundant permute eliding? > > As far as that change goes, I think it would more convincing to have > a motivating case in which the pass wasn't being used to turn > non-vectorisable code into vectorisable code. > > > One could argue that vect_transform_slp_perm_load_1 should > > not fail for those - in this case it fails for > > group_size == dr_group_size == 16 but nunits [4,4], so > > a !repeating_p permute. > > > >> I can see that a similar type of pass might be useful for pushing > >> permutations around until the graph is vectorisable. But I think > >> that should be a separate subpass that runs first, with a different goal > >> function. Perhaps it would share enough code with the optimisation pass > >> that they could both be subclasses of a common base class (not sure). > > > > I'll note that at least for permutes on loads we're in the difficult > > situation that permute lowering only handles > > interleaving patterns right now and other permutes are > > not lowered but left to vectorizable_load. Trying to legitimize > > those earlier (anticipating what vectorizable_load would do) would > > be nice but also necessarily dependent on VF (permute lowering also > > looks at VF and at a point where it's not yet final ...) > > > > But as you say, legitimizing should be separate from optimizing but > > the issue at hand is that optimizing on a not legitimized graph > > ties its hands because of the reverse permute requirement ... > > Yeah, but like I say, I think the legitimising should happen first. > The hand-tying wouldn't matter then: the optimisation pass would start > with a vectorisable graph (as far as permutes go) and would have an > opportunity to improve it. > > I think the general legitimisation problem isn't just about eliding. > It's about pushing permutations around until a new (possibly non-empty) > set of permutations work. And since it's a question of legitimisation, > we should do that even if it involves "pessimising" hotter blocks to make > colder blocks work, which is something that the optimisation pass tries > to avoid (when optimising for speed). > > >> > The patch fixes up internal_node_cost to anticipate redundant permute > >> > eliding and short-cuts the above requirement on edges to partitions > >> > that do not have a prefered layout with the idea we shouldn't have > >> > the need to revert to layout zero for those. > >> > > >> > The downside is that we now can arrive at invalid layout configurations > >> > which we deal with simply giving up. > >> > >> Right. I think that for other testcases that are currently successfully > >> optimised, and that are vectorisable with and without the optimisation, > >> we might lose optimisations by doing this, because the pass could get > >> stuck in a dead end. The risk is probably higher for VLA testcases, > >> due to the same property that motivates this patch. > >> > >> The change therefore feels like a bit of a hack to me. > > > > It probably is, esp. singling out that -1 layout. The question is > > what the alternative is? > > Running a new legitimisation pass just before the optimisation pass > seems like the way to go to me, but I realise it's not a quick fix.
OK. I think it should be possible to do "limited" legitimisation for specific cases like this pretty easily as a pre-pass. We could do this within the same framework, re-using graph building, initial layout computation and materialization / redundant permutation eliding. Adding just a legitimise_pass () into the current pipeline looks appealing to me, but I'm not sure how much wreckage we'd eventually cause (for this case we're just after eliding a redundant permutation). So better do a full separate pass and not try sharing graph/partition and layout info? Richard. > > > > What's the purpose of rejecting the layout at > > > > /* Reject the layout if it would make layout 0 > > impossible > > for later partitions. This amounts to testing that > > the > > target supports reversing the layout change on > > edges > > to later partitions. > > > > In principle, it might be possible to push a layout > > change all the way down a graph, so that it never > > needs to be reversed and so that the target doesn't > > need to support the reverse operation. But it > > would > > be awkward to bail out if we hit a partition that > > does not support the new layout, especially since > > we are not dealing with a lattice. */ > > is_possible &= edge_layout_cost (ud, other_node_i, 0, > > layout_i).is_possible > > (); > > > > ? It seems we want to enable the backward pass to undo everything the > > forward pass did, because that might have gone "too far"? But we > > do not check that we can "undo", aka go back to the initial chosen > > layout (which is not always 0 but matches the reverse load permutation?) > > My alternative idea was to not require the ability to go back if > > layout 0 is "impossible" in terms of internal_node_cost? > > The assumption is that layout 0 is the conservative choice. Of course, > that isn't true if the original layout isn't vectorisable, but like > I say, it's not IMO the pass's job to handle that case. > > Thanks, > Richard > -- Richard Biener <[email protected]> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
