Hi Vladimir,
On 14/06/24 10:56 pm, Vladimir Makarov wrote:
>
> On 6/13/24 00:34, Surya Kumari Jangala wrote:
>> Hi Vladimir,
>> With my patch for PR111673 (scale the spill/restore cost of callee-save
>> register with the frequency of the entry bb in the routine assign_hard_reg()
>> :
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html), the
>> following Linaro aarch64 test failed due to an extra 'mov' instruction:
>>
>> __SVBool_t __attribute__((noipa))
>> callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3,
>> __SVBool_t mem0, __SVBool_t mem1)
>> {
>> p0 = svbrkpa_z (p0, p1, p2);
>> p0 = svbrkpb_z (p0, p3, mem0);
>> return svbrka_z (p0, mem1);
>> }
>>
>> With trunk:
>> addvl sp, sp, #-1
>> str p14, [sp]
>> str p15, [sp, #1, mul vl]
>> ldr p14, [x0]
>> ldr p15, [x1]
>> brkpa p0.b, p0/z, p1.b, p2.b
>> brkpb p0.b, p0/z, p3.b, p14.b
>> brka p0.b, p0/z, p15.b
>> ldr p14, [sp]
>> ldr p15, [sp, #1, mul vl]
>> addvl sp, sp, #1
>> ret
>>
>> With patch:
>> addvl sp, sp, #-1
>> str p14, [sp]
>> str p15, [sp, #1, mul vl]
>> mov p14.b, p3.b // extra mov insn
>> ldr p15, [x0]
>> ldr p3, [x1]
>> brkpa p0.b, p0/z, p1.b, p2.b
>> brkpb p0.b, p0/z, p14.b, p15.b
>> brka p0.b, p0/z, p3.b
>> ldr p14, [sp]
>> ldr p15, [sp, #1, mul vl]
>> addvl sp, sp, #1
>> ret
>>
>> p0-p15 are predicate registers on aarch64 where p0-p3 are caller-save while
>> p4-p15 are callee-save.
>>
>> The input RTL for ira pass:
>>
>> 1: set r112, r68 #p0
>> 2: set r113, r69 #p1
>> 3: set r114, r70 #p2
>> 4: set r115, r71 #p3
>> 5: set r116, x0 #mem0, the 5th parameter
>> 6: set r108, mem(r116)
>> 7: set r117, x1 #mem1, the 6th parameter
>> 8: set r110, mem(r117)
>> 9: set r100, unspec_brkpa(r112, r113, r114)
>> 10: set r101, unspec_brkpb(r100, r115, r108)
>> 11: set r68, unspec_brka(r101, r110)
>> 12: ret r68
>>
>> Here, r68-r71 represent predicate hard regs p0-p3.
>> With my patch, r113 and r114 are being assigned memory by ira but with trunk
>> they are
>> assigned registers. This in turn leads to a difference in decisions taken by
>> LRA
>> ultimately leading to the extra mov instruction.
>>
>> Register assignment w/ patch:
>>
>> Popping a5(r112,l0) -- assign reg p0
>> Popping a2(r100,l0) -- assign reg p0
>> Popping a0(r101,l0) -- assign reg p0
>> Popping a1(r110,l0) -- assign reg p3
>> Popping a3(r115,l0) -- assign reg p2
>> Popping a4(r108,l0) -- assign reg p1
>> Popping a6(r113,l0) -- (memory is more profitable 8000 vs 9000)
>> spill!
>> Popping a7(r114,l0) -- (memory is more profitable 8000 vs 9000)
>> spill!
>> Popping a8(r117,l0) -- assign reg 1
>> Popping a9(r116,l0) -- assign reg 0
>>
>>
>> With patch, cost of memory is 8000 and it is lesser than the cost of
>> callee-save
>> register (9000) and hence memory is assigned to r113 and r114. It is
>> interesting
>> to see that all the callee-save registers are free but none is chosen.
>>
>> The two instructions in which r113 is referenced are:
>> 2: set r113, r69 #p1
>> 9: set r100, unspec_brkpa(r112, r113, r114)
>>
>> IRA computes the memory cost of an allocno in find_costs_and_classes(). In
>> this routine
>> IRA scans each insn and computes memory cost and cost of register classes
>> for each
>> operand in the insn.
>>
>> So for insn 2, memory cost of r113 is set to 4000 because this is the cost
>> of storing
>> r69 to memory if r113 is assigned to memory. The possible register classes
>> of r113
>> are ALL_REGS, PR_REGS, PR_HI_REGS and PR_LO_REGS. The cost of moving r69
>> to r113 if r113 is assigned a register from each of the possible register
>> classes is
>> computed. If r113 is assigned a reg in ALL_REGS, then the cost of the
>> move is 18000, while if r113 is assigned a register from any of the
>> predicate register
>> classes, then the cost of the move is 2000. This cost is obtained from the
>> array
>> “ira_register_move_cost”. After scanning insn 9, memory cost of r113
>> is increased to 8000 because if r113 is assigned memory, we need a load to
>> read the
>> value before using it in the unspec_brkpa. But the register class cost is
>> unchanged.
>>
>> Later in setup_allocno_class_and_costs(), the ALLOCNO_CLASS of r113 is set
>> to PR_REGS.
>> The ALLOCNO_MEMORY_COST of r113 is set to 8000.
>> The ALLOCNO_HARD_REG_COSTS of each register in PR_REGS is set to 2000.
>>
>> During coloring, when r113 has to be assigned a register, the cost of
>> callee-save
>> registers in PR_REGS is increased by the spill/restore cost. So the cost
>> of callee-save registers increases from 2000 to 9000. All the caller-save
>> registers
>> have been assigned to other allocnos, so for r113 memory is assigned
>> as memory is cheaper than callee-save registers.
>>
>> However, for r108, the cost is 0 for register classes PR_REGS, PR_HI_REGS
>> and PR_LO_REGS.
>>
>> References of r108:
>> 6: set r108, mem(r116)
>> 10: set r101, unspec_brkpb(r100, r115, r108)
>>
>> It was surprising that while for r113, the cost of the predicate register
>> classes
>> was 2000, for r108 the predicate register classes had a cost of 0.
>>
>> After processing insn 6, the costs for r108 are:
>> op 0(r=108) MEM:4000(+4000) ALL_REGS:8000(+8000) PR_REGS:0(+0)
>> PR_HI_REGS:0(+0) PR_LO_REGS:0(+0)
>>
>> After processing insn 10:
>> op 3(r=108) MEM:8000(+4000) ALL_REGS:20000(+12000) PR_REGS:0(+0)
>> PR_HI_REGS:0(+0) PR_LO_REGS:0(+0)
>>
>>
>> For insn 6, record_reg_classes() goes through each constraint for each
>> operand.
>> For each alternative constraint, this routine computes costs of register
>> classes
>> and memory cost for each pseudo register present in the insn. IRA computes
>> register
>> class cost as the cost required to move value from/to a location of type
>> specified
>> in the constraint to/from a register in the register class. IRA also
>> calculates
>> the cost of using an alternative. After computing all the register class
>> costs
>> for all the alternatives, IRA takes the minimum cost for each register class,
>> adds the alternative cost and assigns this to the pseudo r108.
>>
>> For insn 6, the constraints are:
>> operand 1: Upa,m,Upa,
>> operand 2: Upa,Upa,m,
>>
>> Here, Upa stands for predicate register and ‘m’ denotes memory.
>>
>> For insn 6, the best alternative is Upa for operand 1 and m for operand 2.
>> For this
>> alternative, the alternative cost is 0.
>> The costs for the register classes PR_REGS, PR_HI_REGS, PR_LO_REGS is 0 for
>> operand 1 (r108).
>> IRA uses the array “ira_may_move_out_cost” to compute the costs of the
>> register
>> classes for r108 and this array has value 0 because the first index is a
>> superset
>> of the second index. Here, first index is PR_REGS since the constraint is
>> ‘Upa’
>> while second index is the predicate register class.
>>
>> Why are different arrays being used? As we can see above, for r113 the cost
>> of
>> the predicate register classes is 2000 because we are getting the value from
>> “ira_register_move_cost”. While for r108, the cost is obtained from
>> “ira_may_move_out_cost”. Why should r113 have a higher cost for predicate
>> register
>> classes as compared to r108?
>
> An interesting question. The code originally was taken from old RA
> (particular regclass.c). You can find it in any version of gcc before 4.4
> version. Since then it changed a lot for different reasons (mostly for
> solving different PRs).
>
> I can not find where p113 can get cost from ira_register_move_cost. The only
> possibility I see is
>
> ...
>
> else if (ira_reg_class_intersect
> [pref_class][op_class] == NO_REGS)
> alt_cost
> +=
> ira_register_move_cost[mode][pref_class][op_class];
>
> It is when preferred class from the 1st iteration does not intersect with the
> operand class which looks ok for me. May be I missed something. Could you
> point me the place where p113 gets cost from ira_register_move_class. Even
> better if you could me provide ira-dump (-fira-verbose=16 for the test case).
>
p113 gets the cost from 'ira_register_move_cost' in the routine
record_operand_costs(). In this routine, if we have a 'set' insn where both
destination and source are registers, and one of them is a hard register and
the other is a pseudo register, then the costs of the register classes of the
pseudo register is obtained from the array 'ira_register_move_cost'.
The following insn meets the criteria (r69 is a hard register)
2: set r113, r69 #p1
Here is the code snippet:
move_costs = ira_register_move_cost[cost_mode];
...
for (k = cost_classes_ptr->num - 1; k >= 0; k--)
{
rclass = cost_classes[k];
cost = (i == 0
? move_costs[hard_reg_class][rclass]
: move_costs[rclass][hard_reg_class]);
cost *= cost_factor;
op_costs[i]->cost[k] = cost * frequency;
...
}
Regards,
Surya