Hello Richard:
On 10/06/24 3:20 pm, Richard Sandiford wrote:
> Ajit Agarwal <[email protected]> writes:
>> Hello Richard:
>>
>> On 10/06/24 2:52 pm, Richard Sandiford wrote:
>>> Ajit Agarwal <[email protected]> writes:
>>>> On 10/06/24 2:12 pm, Richard Sandiford wrote:
>>>>> Ajit Agarwal <[email protected]> writes:
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + rtx set = single_set (insn);
>>>>>>>>>>>>>> + if (set == NULL_RTX)
>>>>>>>>>>>>>> + return false;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + rtx op0 = SET_SRC (set);
>>>>>>>>>>>>>> + rtx_code code = GET_CODE (op0);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + // This check is added as register pairs are not
>>>>>>>>>>>>>> generated
>>>>>>>>>>>>>> + // by RA for neg:V2DF (fma: V2DF (reg1)
>>>>>>>>>>>>>> + // (reg2)
>>>>>>>>>>>>>> + // (neg:V2DF (reg3)))
>>>>>>>>>>>>>> + if (GET_RTX_CLASS (code) == RTX_UNARY)
>>>>>>>>>>>>>> + return false;
>>>>>>>>>>>>>
>>>>>>>>>>>>> What's special about (neg (fma ...))?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am not sure why register allocator fails allocating register
>>>>>>>>>>>> pairs with
>>>>>>>>>>>> NEG Unary operation with fma operand. I have not debugged register
>>>>>>>>>>>> allocator why the NEG
>>>>>>>>>>>> Unary operation with fma operand.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For neg (fma ...) cases because of subreg 128 bits from OOmode 256
>>>>>>>>>> bits are
>>>>>>>>>> set correctly.
>>>>>>>>>> IRA marked them spill candidates as spill priority is zero.
>>>>>>>>>>
>>>>>>>>>> Due to this LRA reload pass couldn't allocate register pairs.
>>>>>>>>>
>>>>>>>>> I think this is just restating the symptom though. I suppose the same
>>>>>>>>> kind of questions apply here too: what was the instruction before the
>>>>>>>>> pass runs, what was the instruction after the pass runs, and why is
>>>>>>>>> the rtl change incorrect (by the meaning above)?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Original case where we dont do load fusion, spill happens, in that
>>>>>>>> case we dont require sequential register pairs to be generated for 2
>>>>>>>> loads
>>>>>>>> for. Hence it worked.
>>>>>>>>
>>>>>>>> rtl change is correct and there is no error.
>>>>>>>>
>>>>>>>> for load fusion spill happens and we dont generate sequential register
>>>>>>>> pairs
>>>>>>>> because pf spill candidate and lxvp gives incorrect results as
>>>>>>>> sequential register
>>>>>>>> pairs are required for lxvp.
>>>>>>>
>>>>>>> Can you go into more detail? How is the lxvp represented? And how do
>>>>>>> we end up not getting a sequential register pair? What does the rtl
>>>>>>> look like (before and after things have gone wrong)?
>>>>>>>
>>>>>>> It seems like either the rtl is not describing the result of the fusion
>>>>>>> correctly or there is some problem in the .md description of lxvp.
>>>>>>>
>>>>>>
>>>>>> After fusion pass:
>>>>>>
>>>>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ]
>>>>>> [240])
>>>>>> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
>>>>>> (const_int 16 [0x10])) [1 MEM <vector(2) real(kind=8)>
>>>>>> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190
>>>>>> {vsx_movv2df_64bit}
>>>>>> (nil))
>>>>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ]
>>>>>> [240])
>>>>>> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2)
>>>>>> real(kind=8)> [(real(kind=8) *)_4050]+16 ])
>>>>>> (reg:V2DF 44 12 [3119])
>>>>>> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ]
>>>>>> [240]))))) {*vsx_nfmsv2df4}
>>>>>> (nil))
>>>>>>
>>>>>> In LRA reload.
>>>>>>
>>>>>> (insn 2472 2461 2412 161 (set (reg:OO 2572 [ vect__300.543_236 ])
>>>>>> (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) [1 MEM
>>>>>> <vector(2) real(kind=8)> [(real(kind=8) *)_4188]+0 S16 A64]))
>>>>>> "shell_lam.fppized.f":238:72 2187 {*movoo}
>>>>>> (expr_list:REG_EQUIV (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ]
>>>>>> [1285]) [1 MEM <vector(2) real(kind=8)> [(real(kind=8) *)_4188]+0 S16
>>>>>> A64])
>>>>>> (nil)))
>>>>>> (insn 2412 2472 2477 161 (set (reg:V2DF 240 [ vect__302.545 ])
>>>>>> (neg:V2DF (fma:V2DF (subreg:V2DF (reg:OO 2561 [ MEM <vector(2)
>>>>>> real(kind=8)> [(real(kind=8) *)_4050] ]) 16)
>>>>>> (reg:V2DF 4283 [3119])
>>>>>> (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236
>>>>>> ]) 16))))) {*vsx_nfmsv2df4}
>>>>>> (nil))
>>>>>>
>>>>>>
>>>>>> In LRA reload sequential registers are not generated as r2572 is splled
>>>>>> and move to spill location
>>>>>> in stack and subsequent uses loads from stack. Hence sequential
>>>>>> registers pairs are not generated.
>>>>>>
>>>>>> lxvp vsx0, 0(r1).
>>>>>>
>>>>>> It loads from from r1+0 into vsx0 and vsx1 and appropriate uses use
>>>>>> sequential register pairs.
>>>>>>
>>>>>> Without load fusion since 2 loads exists and 2 loads need not require
>>>>>> sequential registers
>>>>>> hence it worked but with load fusion and using lxvp it requires
>>>>>> sequential register pairs.
>>>>>
>>>>> Do you mean that this is a performance regression? I.e. the fact that
>>>>> lxvp requires sequential registers causes extra spilling, due to having
>>>>> less allocation freedom?
>>>>>
>>>>> Or is it a correctness problem? If so, what is it? Nothing in the rtl
>>>>> above looks wrong in principle (although I've no idea if the REG_EQUIV
>>>>> is correct in this context). What does the allocated code look like,
>>>>> and why is it wrong?
>>>>>
>>>>> If (reg:OO 2561) is spilled and then one half of it used, only that half
>>>>> needs to be loaded from the spill slot. E.g. if (reg:OO 2561) is reloaded
>>>>> for insn 2412 on its own, only the second half of the register needs to be
>>>>> loaded from memory.
>>>>>
>>>>
>>>> This is bwaves spec 2017 benchmark. Spill happening in register allocator
>>>> could be because of less registers available in order to generate
>>>> sequential registers for lxvp.
>>>>
>>>> Because of spill sequential registers are not generated and breaks the
>>>> correctness. REG_EQUIV is generated by IRA.
>>>>
>>>> Allocated code because of spill doesn't generate sequential registers.
>>>> In LRA reload because of spill marked for IRA adjust to another
>>>> register causing not to generate sequential registers.
>>>>
>>>> reg:OO 2572 is spilled not reg:OO 2561. Because of spilled its
>>>> loaded from memory instead of generating sequential registers.
>>>>
>>>> Other reasons for spilling because of long live range for reg:OO 2572.
>>>>
>>>> We can add heuristics in fusion code not to fuse for longer live
>>>> ranges that should solve the problem.
>>>
>>> This doesn't describe the real problem though. It's natural for
>>> registers to be spilled sometimes. That would happen for OOmode
>>> registers even if we didn't form them in the fusion pass. And it's
>>> normal GCC semantics that the two hard registers allocated to an OOmode
>>> pseudo are consecutive.
>>>
>>> Like I asked above, please show the allocated code (including spills
>>> and reloads) and explain why it's wrong.
>>>
>>
>> After LRA reload:
>>
>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ]
>> [240])
>> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
>> (const_int 16 [0x10])) [1 MEM <vector(2) real(kind=8)>
>> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190
>> {vsx_movv2df_64bit}
>> (nil))
>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ]
>> [240])
>> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM <vector(2) real(kind=8)>
>> [(real(kind=8) *)_4050]+16 ])
>> (reg:V2DF 44 12 [3119])
>> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ]
>> [240]))))) {*vsx_nfmsv2df4}
>> (nil))
>>
>> (insn 2473 9311 9312 187 (set (reg:V2DF 38 6 [orig:905 vect__302.545 ] [905])
>> (neg:V2DF (fma:V2DF (reg:V2DF 44 12 [3119])
>> (reg:V2DF 38 6 [orig:2561 MEM <vector(2) real(kind=8)>
>> [(real(kind=8) *)_4050] ] [2561])
>> (neg:V2DF (reg:V2DF 47 15 [5266]))))) {*vsx_nfmsv2df4}
>> (nil))
>>
>> In the above allocated code it assign registers 51 and 47 and they are not
>> sequential.
>
> The reload for 2412 looks valid. What was the original pre-reload
> version of insn 2473? Also, what happened to insn 2472? Was it deleted?
>
This is preload version of 2473:
(insn 2473 2396 2478 161 (set (reg:V2DF 905 [ vect__302.545 ])
(neg:V2DF (fma:V2DF (reg:V2DF 4283 [3119])
(subreg:V2DF (reg:OO 2561 [ MEM <vector(2) real(kind=8)>
[(real(kind=8) *)_4050] ]) 0)
(neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 ])
0))))) {*vsx_nfmsv2df4}
(expr_list:REG_DEAD (reg:OO 2572 [ vect__300.543_236 ])
(expr_list:REG_DEAD (reg:OO 2561 [ MEM <vector(2) real(kind=8)>
[(real(kind=8) *)_4050] ])
(nil))))
insn 2472 is replaced with 9299 after reload.
> Richard
Thanks & Regards
Ajit