Kyrylo Tkachov <ktkac...@nvidia.com> writes:
> Hi Richard,
>
>> On 5 Feb 2025, at 09:57, Richard Sandiford <richard.sandif...@arm.com> wrote:
>> 
>> gcc.target/aarch64/sve/acle/general/ldff1_8.c and
>> gcc.target/aarch64/sve/ptest_1.c were failing because the
>> aarch64 port was giving a zero (unknown) cost to instructions
>> that compute two results in parallel.  This was latent until
>> r15-1575-gea8061f46a30, which fixed rtl-ssa to treat zero costs
>> as unknown.
>> 
>> A long-standing todo here is to make insn_cost derive costs from md
>> information, rather than having to write a lot of matching code in
>> aarch64_rtx_costs.  But that's not something we can do for GCC 15.
>> 
>> This patch instead treats the cost of a PARALLEL as being the maximum
>> cost of its constituent sets.  I don't like this very much, since it
>> isn't really target-specific behaviour.  If it were stage 1, I'd be
>> trying to change pattern_cost instead.
>> 
>> Tested on aarch64-linux-gnu.  I'll push this tomorrow if there are no
>> comments before then.  In particular, please let me know if you'd be
>> happy with experimenting with a pattern_cost change even at this stage.
>> (The current behaviour is to bail out if more than one set in a parallel
>> is live and neither of them is a comparison.)
>
> IMO having this hunk in the backend and moving it into pattern_cost early in 
> GCC 16 is the optimal risk/benefit approach.
> You could add a TODO or a PR marker to the comment to make it more explicit 
> if you’d like (I don’t insist).

OK.  I realised after sending that there's another problem: pattern_cost
uses set_src_cost, whereas we want set_rtx_cost.  At that point, perhaps
it's better to leave pattern_cost after all and just concentrate on the
"use attributes" TODO.

>> 
>> Richard
>> 
>> 
>> gcc/
>> * config/aarch64/aarch64.cc (aarch64_insn_cost): Give PARALLELs
>> the same cost as the costliest SET.
>> ---
>> gcc/config/aarch64/aarch64.cc | 19 ++++++++++++++++++-
>> 1 file changed, 18 insertions(+), 1 deletion(-)
>> 
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index 16754fa9e7b..c1e40200806 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -15889,7 +15889,24 @@ aarch64_insn_cost (rtx_insn *insn, bool speed)
>> {
>>   if (rtx set = single_set (insn))
>>     return set_rtx_cost (set, speed);
>> -  return pattern_cost (PATTERN (insn), speed);
>> +
>> +  /* If the instruction does multiple sets in parallel, use the cost
>> +     of the most expensive set.  This copes with instructions that set
>> +     the flags to a useful value as a side effect.  */
>> +  rtx pat = PATTERN (insn);
>> +  if (GET_CODE (pat) == PARALLEL)
>> +    {
>> +      int max_cost = 0;
>> +      for (int i = 0; i < XVECLEN (pat, 0); ++i)
>> + {
>> +  rtx x = XVECEXP (pat, 0, i);
>
> Not something for this patch, but would it make sense to add begin () and end 
> () to RTL vectors so that we can iterate over them with range-based for?
> I haven’t thought this through, so it may well be not practical given the 
> specific structure of RTL fields.

Ah yeah, good point.  We should probably make it more array_slice like
in general, e.g. with size (), indexing, etc.  Might give that a go.

> The patch looks ok to me.

Thanks, now pushed.

Richard

Reply via email to