Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

Ajit Agarwal Wed, 14 Feb 2024 06:15:40 -0800

On 14/02/24 7:22 pm, Ajit Agarwal wrote:
> Hello Richard:
> 
> 
> On 14/02/24 4:03 pm, Richard Sandiford wrote:
>> Hi,
>>
>> Thanks for working on this.
>>
>> You posted a version of this patch on Sunday too.  If you need to repost
>> to fix bugs or make other improvements, could you describe the changes
>> that you've made since the previous version?  It makes things easier
>> to follow.
> 
> Sure. Sorry for that I forgot to add that.

There were certain asserts that I have removed it in the earlier
patch that I have sent on Sunday and forgot to keep them.
I have addressed them in this patch.
I have done rtl_dce changes and they were not deleting some of
the unwanted moves and hence I changed the code to address this in this patch.

Thanks & Regards
Ajit
> 
>>
>> Also, sorry for starting with a meta discussion about reviews, but
>> there are multiple types of review comment, including:
>>
>> (1) Suggestions for changes that are worded as suggestions.
>>
>> (2) Suggestions for changes that are worded as questions ("Wouldn't it be
>>     better to do X?", etc).
>>
>> (3) Questions asking for an explanation or for more information.
>>
>> Just sending a new patch makes sense when the previous review comments
>> were all like (1), and arguably also (1)+(2).  But Alex's previous review
>> included (3) as well.  Could you go back and respond to his questions there?
>> It would help understand some of the design choices.
>>
> 
> I have responded to Alex comments for the previous patches.
> I have incorporated all of his comments in this patch.
> 
>  
>> A natural starting point when reviewing a patch like this is to diff
>> the current aarch64-ldp-fusion.cc with the new pair-fusion.cc.  This shows
>> many of the kind of changes that I'd expect.  But it also seems to include
>> some code reordering, such as putting fuse_pair after try_fuse_pair.
>> If some reordering is necessary, could you try to organise the patch as
>> a series in which the reordering is a separate step?  It's a bit hard
>> to review at the moment.  (Reordering for cosmetic reasons is also OK,
>> but again please separate it out for ease of review.)
>>
>> Maybe one way of making the review easier would be to split the aarch64
>> pass into the "target-dependent" and "target-independent" pieces
>> in-place, i.e. keeping everything within aarch64-ldp-fusion.cc, and then
>> (as separate patches) move the target-independent pieces outside
>> config/aarch64.
>>
> Sure I will do that.
> 
>> The patch includes:
>>
>>>     * emit-rtl.cc: Modify ge with gt on PolyINT data structure.
>>>     * dce.cc: Add changes not to delete the load store pair.
>>>     * rtl-ssa/changes.cc: Modified assert code.
>>>     * var-tracking.cc: Modified assert code.
>>>     * df-problems.cc: Not to generate REG_UNUSED for multi
>>>     word registers that is requied for rs6000 target.
>>
>> Please submit these separately, as independent preparatory patches,
>> with an explanation for why they're needed & correct.  But:
>>
> Sure I will do that.
> 
>>> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
>>> index 88ee0dd67fc..a8d0ee7c4db 100644
>>> --- a/gcc/df-problems.cc
>>> +++ b/gcc/df-problems.cc
>>> @@ -3360,7 +3360,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct 
>>> df_mw_hardreg *mws,
>>>    if (df_whole_mw_reg_unused_p (mws, live, artificial_uses))
>>>      {
>>>        unsigned int regno = mws->start_regno;
>>> -      df_set_note (REG_UNUSED, insn, mws->mw_reg);
>>> +      //df_set_note (REG_UNUSED, insn, mws->mw_reg);
>>>        dead_debug_insert_temp (debug, regno, insn, 
>>> DEBUG_TEMP_AFTER_WITH_REG);
>>>  
>>>        if (REG_DEAD_DEBUGGING)
>>> @@ -3375,7 +3375,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct 
>>> df_mw_hardreg *mws,
>>>     if (!bitmap_bit_p (live, r)
>>>         && !bitmap_bit_p (artificial_uses, r))
>>>       {
>>> -       df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]);
>>> +      // df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]);
>>>         dead_debug_insert_temp (debug, r, insn, DEBUG_TEMP_AFTER_WITH_REG);
>>>         if (REG_DEAD_DEBUGGING)
>>>           df_print_note ("adding 2: ", insn, REG_NOTES (insn));
>>> @@ -3493,9 +3493,9 @@ df_create_unused_note (rtx_insn *insn, df_ref def,
>>>     || bitmap_bit_p (artificial_uses, dregno)
>>>     || df_ignore_stack_reg (dregno)))
>>>      {
>>> -      rtx reg = (DF_REF_LOC (def))
>>> -                ? *DF_REF_REAL_LOC (def): DF_REF_REG (def);
>>> -      df_set_note (REG_UNUSED, insn, reg);
>>> +      //rtx reg = (DF_REF_LOC (def))
>>> +      //          ? *DF_REF_REAL_LOC (def): DF_REF_REG (def);
>>> +      //df_set_note (REG_UNUSED, insn, reg);
>>>        dead_debug_insert_temp (debug, dregno, insn, 
>>> DEBUG_TEMP_AFTER_WITH_REG);
>>>        if (REG_DEAD_DEBUGGING)
>>>     df_print_note ("adding 3: ", insn, REG_NOTES (insn));
>>
>> I don't think this can be right.  The last hunk of the var-tracking.cc
>> patch also seems to be reverting a correct change.
>>
> 
> We generate sequential registers using (subreg V16QI (reg 00mode) 16)
> and (reg OOmode 0)
> where OOmode is 256 bit and V16QI is 128 bits in order to generate
> sequential register pair. If I keep the above REG_UNUSED notes ira generates
> REG_UNUSED and in cprop_harreg pass and dce pass deletes store pairs and
> we get incorrect code.
> 
> By commenting REG_UNUSED notes it is not generated and we get the correct 
> store
> pair fusion and cprop_hardreg and dce doesn't deletes them.
> 
> Please let me know is there are better ways to address this instead of 
> commenting
> above generation of REG_UNUSED notes.
> 
> Thanks & Regards
> Ajit
>> Thanks,
>> Richard
Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

Reply via email to