Kyrylo Tkachov <ktkac...@nvidia.com> writes: > Hi Richard, > >> On 5 Feb 2025, at 09:57, Richard Sandiford <richard.sandif...@arm.com> wrote: >> >> gcc.target/aarch64/sve/acle/general/ldff1_8.c and >> gcc.target/aarch64/sve/ptest_1.c were failing because the >> aarch64 port was giving a zero (unknown) cost to instructions >> that compute two results in parallel. This was latent until >> r15-1575-gea8061f46a30, which fixed rtl-ssa to treat zero costs >> as unknown. >> >> A long-standing todo here is to make insn_cost derive costs from md >> information, rather than having to write a lot of matching code in >> aarch64_rtx_costs. But that's not something we can do for GCC 15. >> >> This patch instead treats the cost of a PARALLEL as being the maximum >> cost of its constituent sets. I don't like this very much, since it >> isn't really target-specific behaviour. If it were stage 1, I'd be >> trying to change pattern_cost instead. >> >> Tested on aarch64-linux-gnu. I'll push this tomorrow if there are no >> comments before then. In particular, please let me know if you'd be >> happy with experimenting with a pattern_cost change even at this stage. >> (The current behaviour is to bail out if more than one set in a parallel >> is live and neither of them is a comparison.) > > IMO having this hunk in the backend and moving it into pattern_cost early in > GCC 16 is the optimal risk/benefit approach. > You could add a TODO or a PR marker to the comment to make it more explicit > if you’d like (I don’t insist).
OK. I realised after sending that there's another problem: pattern_cost uses set_src_cost, whereas we want set_rtx_cost. At that point, perhaps it's better to leave pattern_cost after all and just concentrate on the "use attributes" TODO. >> >> Richard >> >> >> gcc/ >> * config/aarch64/aarch64.cc (aarch64_insn_cost): Give PARALLELs >> the same cost as the costliest SET. >> --- >> gcc/config/aarch64/aarch64.cc | 19 ++++++++++++++++++- >> 1 file changed, 18 insertions(+), 1 deletion(-) >> >> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc >> index 16754fa9e7b..c1e40200806 100644 >> --- a/gcc/config/aarch64/aarch64.cc >> +++ b/gcc/config/aarch64/aarch64.cc >> @@ -15889,7 +15889,24 @@ aarch64_insn_cost (rtx_insn *insn, bool speed) >> { >> if (rtx set = single_set (insn)) >> return set_rtx_cost (set, speed); >> - return pattern_cost (PATTERN (insn), speed); >> + >> + /* If the instruction does multiple sets in parallel, use the cost >> + of the most expensive set. This copes with instructions that set >> + the flags to a useful value as a side effect. */ >> + rtx pat = PATTERN (insn); >> + if (GET_CODE (pat) == PARALLEL) >> + { >> + int max_cost = 0; >> + for (int i = 0; i < XVECLEN (pat, 0); ++i) >> + { >> + rtx x = XVECEXP (pat, 0, i); > > Not something for this patch, but would it make sense to add begin () and end > () to RTL vectors so that we can iterate over them with range-based for? > I haven’t thought this through, so it may well be not practical given the > specific structure of RTL fields. Ah yeah, good point. We should probably make it more array_slice like in general, e.g. with size (), indexing, etc. Might give that a go. > The patch looks ok to me. Thanks, now pushed. Richard