Hi Vladimir,
On 14/06/24 10:56 pm, Vladimir Makarov wrote:
>
> On 6/13/24 00:34, Surya Kumari Jangala wrote:
>> Hi Vladimir,
>> With my patch for PR111673 (scale the spill/restore cost of callee-save
>> register with the frequency of the entry bb in the routine assign_hard_reg()
>> :
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html), the
>> following Linaro aarch64 test failed due to an extra 'mov' instruction:
>>
>> __SVBool_t __attribute__((noipa))
>> callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3,
>> __SVBool_t mem0, __SVBool_t mem1)
>> {
>> p0 = svbrkpa_z (p0, p1, p2);
>> p0 = svbrkpb_z (p0, p3, mem0);
>> return svbrka_z (p0, mem1);
>> }
>>
>> With trunk:
>> addvl sp, sp, #-1
>> str p14, [sp]
>> str p15, [sp, #1, mul vl]
>> ldr p14, [x0]
>> ldr p15, [x1]
>> brkpa p0.b, p0/z, p1.b, p2.b
>> brkpb p0.b, p0/z, p3.b, p14.b
>> brka p0.b, p0/z, p15.b
>> ldr p14, [sp]
>> ldr p15, [sp, #1, mul vl]
>> addvl sp, sp, #1
>> ret
>>
>> With patch:
>> addvl sp, sp, #-1
>> str p14, [sp]
>> str p15, [sp, #1, mul vl]
>> mov p14.b, p3.b // extra mov insn
>> ldr p15, [x0]
>> ldr p3, [x1]
>> brkpa p0.b, p0/z, p1.b, p2.b
>> brkpb p0.b, p0/z, p14.b, p15.b
>> brka p0.b, p0/z, p3.b
>> ldr p14, [sp]
>> ldr p15, [sp, #1, mul vl]
>> addvl sp, sp, #1
>> ret
>>
>> p0-p15 are predicate registers on aarch64 where p0-p3 are caller-save while
>> p4-p15 are callee-save.
>>
>> The input RTL for ira pass:
>>
>> 1: set r112, r68 #p0
>> 2: set r113, r69 #p1
>> 3: set r114, r70 #p2
>> 4: set r115, r71 #p3
>> 5: set r116, x0 #mem0, the 5th parameter
>> 6: set r108, mem(r116)
>> 7: set r117, x1 #mem1, the 6th parameter
>> 8: set r110, mem(r117)
>> 9: set r100, unspec_brkpa(r112, r113, r114)
>> 10: set r101, unspec_brkpb(r100, r115, r108)
>> 11: set r68, unspec_brka(r101, r110)
>> 12: ret r68
>>
>> Here, r68-r71 represent predicate hard regs p0-p3.
>> With my patch, r113 and r114 are being assigned memory by ira but with trunk
>> they are
>> assigned registers. This in turn leads to a difference in decisions taken by
>> LRA
>> ultimately leading to the extra mov instruction.
>>
>> Register assignment w/ patch:
>>
>> Popping a5(r112,l0) -- assign reg p0
>> Popping a2(r100,l0) -- assign reg p0
>> Popping a0(r101,l0) -- assign reg p0
>> Popping a1(r110,l0) -- assign reg p3
>> Popping a3(r115,l0) -- assign reg p2
>> Popping a4(r108,l0) -- assign reg p1
>> Popping a6(r113,l0) -- (memory is more profitable 8000 vs 9000)
>> spill!
>> Popping a7(r114,l0) -- (memory is more profitable 8000 vs 9000)
>> spill!
>> Popping a8(r117,l0) -- assign reg 1
>> Popping a9(r116,l0) -- assign reg 0
>>
>>
>> With patch, cost of memory is 8000 and it is lesser than the cost of
>> callee-save
>> register (9000) and hence memory is assigned to r113 and r114. It is
>> interesting
>> to see that all the callee-save registers are free but none is chosen.
>>
>> The two instructions in which r113 is referenced are:
>> 2: set r113, r69 #p1
>> 9: set r100, unspec_brkpa(r112, r113, r114)
>>
>> IRA computes the memory cost of an allocno in find_costs_and_classes(). In
>> this routine
>> IRA scans each insn and computes memory cost and cost of register classes
>> for each
>> operand in the insn.
>>
>> So for insn 2, memory cost of r113 is set to 4000 because this is the cost
>> of storing
>> r69 to memory if r113 is assigned to memory. The possible register classes
>> of r113
>> are ALL_REGS, PR_REGS, PR_HI_REGS and PR_LO_REGS. The cost of moving r69
>> to r113 if r113 is assigned a register from each of the possible register
>> classes is
>> computed. If r113 is assigned a reg in ALL_REGS, then the cost of the
>> move is 18000, while if r113 is assigned a register from any of the
>> predicate register
>> classes, then the cost of the move is 2000. This cost is obtained from the
>> array
>> “ira_register_move_cost”. After scanning insn 9, memory cost of r113
>> is increased to 8000 because if r113 is assigned memory, we need a load to
>> read the
>> value before using it in the unspec_brkpa. But the register class cost is
>> unchanged.
>>
>> Later in setup_allocno_class_and_costs(), the ALLOCNO_CLASS of r113 is set
>> to PR_REGS.
>> The ALLOCNO_MEMORY_COST of r113 is set to 8000.
>> The ALLOCNO_HARD_REG_COSTS of each register in PR_REGS is set to 2000.
>>
>> During coloring, when r113 has to be assigned a register, the cost of
>> callee-save
>> registers in PR_REGS is increased by the spill/restore cost. S