On 14/02/24 7:22 pm, Ajit Agarwal wrote:
> Hello Richard:
>
>
> On 14/02/24 4:03 pm, Richard Sandiford wrote:
>> Hi,
>>
>> Thanks for working on this.
>>
>> You posted a version of this patch on Sunday too. If you need to repost
>> to fix bugs or make other improvements, could you describe the changes
>> that you've made since the previous version? It makes things easier
>> to follow.
>
> Sure. Sorry for that I forgot to add that.
There were certain asserts that I have removed it in the earlier
patch that I have sent on Sunday and forgot to keep them.
I have addressed them in this patch.
I have done rtl_dce changes and they were not deleting some of
the unwanted moves and hence I changed the code to address this in this patch.
Thanks & Regards
Ajit
>
>>
>> Also, sorry for starting with a meta discussion about reviews, but
>> there are multiple types of review comment, including:
>>
>> (1) Suggestions for changes that are worded as suggestions.
>>
>> (2) Suggestions for changes that are worded as questions ("Wouldn't it be
>> better to do X?", etc).
>>
>> (3) Questions asking for an explanation or for more information.
>>
>> Just sending a new patch makes sense when the previous review comments
>> were all like (1), and arguably also (1)+(2). But Alex's previous review
>> included (3) as well. Could you go back and respond to his questions there?
>> It would help understand some of the design choices.
>>
>
> I have responded to Alex comments for the previous patches.
> I have incorporated all of his comments in this patch.
>
>
>> A natural starting point when reviewing a patch like this is to diff
>> the current aarch64-ldp-fusion.cc with the new pair-fusion.cc. This shows
>> many of the kind of changes that I'd expect. But it also seems to include
>> some code reordering, such as putting fuse_pair after try_fuse_pair.
>> If some reordering is necessary, could you try to organise the patch as
>> a series in which the reordering is a separate step? It's a bit hard
>> to review at the moment. (Reordering for cosmetic reasons is also OK,
>> but again please separate it out for ease of review.)
>>
>> Maybe one way of making the review easier would be to split the aarch64
>> pass into the "target-dependent" and "target-independent" pieces
>> in-place, i.e. keeping everything within aarch64-ldp-fusion.cc, and then
>> (as separate patches) move the target-independent pieces outside
>> config/aarch64.
>>
> Sure I will do that.
>
>> The patch includes:
>>
>>> * emit-rtl.cc: Modify ge with gt on PolyINT data structure.
>>> * dce.cc: Add changes not to delete the load store pair.
>>> * rtl-ssa/changes.cc: Modified assert code.
>>> * var-tracking.cc: Modified assert code.
>>> * df-problems.cc: Not to generate REG_UNUSED for multi
>>> word registers that is requied for rs6000 target.
>>
>> Please submit these separately, as independent preparatory patches,
>> with an explanation for why they're needed & correct. But:
>>
> Sure I will do that.
>
>>> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
>>> index 88ee0dd67fc..a8d0ee7c4db 100644
>>> --- a/gcc/df-problems.cc
>>> +++ b/gcc/df-problems.cc
>>> @@ -3360,7 +3360,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct
>>> df_mw_hardreg *mws,
>>> if (df_whole_mw_reg_unused_p (mws, live, artificial_uses))
>>> {
>>> unsigned int regno = mws->start_regno;
>>> - df_set_note (REG_UNUSED, insn, mws->mw_reg);
>>> + //df_set_note (REG_UNUSED, insn, mws->mw_reg);
>>> dead_debug_insert_temp (debug, regno, insn,
>>> DEBUG_TEMP_AFTER_WITH_REG);
>>>
>>> if (REG_DEAD_DEBUGGING)
>>> @@ -3375,7 +3375,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct
>>> df_mw_hardreg *mws,
>>> if (!bitmap_bit_p (live, r)
>>> && !bitmap_bit_p (artificial_uses, r))
>>> {
>>> - df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]);
>>> + // df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]);
>>> dead_debug_insert_temp (debug, r, insn, DEBUG_TEMP_AFTER_WITH_REG);
>>> if (REG_DEAD_DEBUGGING)
>>> df_print_note ("adding 2: ", insn, REG_NOTES (insn));
>>> @@ -3493,9 +3493,9 @@ df_create_unused_note (rtx_insn *insn, df_ref def,
>>> || bitmap_bit_p (artificial_uses, dregno)
>>> || df_ignore_stack_reg (dregno)))
>>> {
>>> - rtx reg = (DF_REF_LOC (def))
>>> - ? *DF_REF_REAL_LOC (def): DF_REF_REG (def);
>>> - df_set_note (REG_UNUSED, insn, reg);
>>> + //rtx reg = (DF_REF_LOC (def))
>>> + // ? *DF_REF_REAL_LOC (def): DF_REF_REG (def);
>>> + //df_set_note (REG_UNUSED, insn, reg);
>>> dead_debug_insert_temp (debug, dregno, insn,
>>> DEBUG_TEMP_AFTER_WITH_REG);
>>> if (REG_DEAD_DEBUGGING)
>>> df_print_note ("adding 3: ", insn, REG_NOTES (insn));
>>
>> I don't think this can be right. The last hunk of the var-tracking.cc
>> patch also seems to be reverting a correct change.
>>
>
> We generate sequential registers using (subreg V16QI (reg 00mode) 16)
> and (reg OOmode 0)
> where OOmode is 256 bit and V16QI is 128 bits in order to generate
> sequential register pair. If I keep the above REG_UNUSED notes ira generates
> REG_UNUSED and in cprop_harreg pass and dce pass deletes store pairs and
> we get incorrect code.
>
> By commenting REG_UNUSED notes it is not generated and we get the correct
> store
> pair fusion and cprop_hardreg and dce doesn't deletes them.
>
> Please let me know is there are better ways to address this instead of
> commenting
> above generation of REG_UNUSED notes.
>
> Thanks & Regards
> Ajit
>> Thanks,
>> Richard