Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Richard Sandiford Mon, 14 Sep 2020 12:20:43 -0700

Qing Zhao <qing.z...@oracle.com> writes:
>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandif...@arm.com> 
>> wrote:
>> 
>> Qing Zhao <qing.z...@oracle.com> writes:
>>>> Like I mentioned earlier though, passes that run after
>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>> weren't previously used.  For example, on x86_64, the function might
>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>> the “used” version of the new command-line option is supposed to clear
>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>> collected at the point that the return is generated.
>>> 
>>> Thanks for the information.
>>> 
>>>> 
>>>> That's why I think it's more robust to do this later (at the beginning
>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>> already exist.
>>> 
>>> Yes, looks like it’s not correct to insert the zeroing at the time when 
>>> prologue, epilogue and return are generated.
>>> As I also checked, “return” might be also generated as late as pass 
>>> “pass_delay_slots”,  So, shall we move the
>>> New pass as late as possible?
>> 
>> If we insert the zeroing before pass_delay_slots and describe the
>> result correctly, pass_delay_slots should do the right thing.
>> 
>> Describing the result correctly includes ensuring that the cleared
>> registers are treated as live on exit from the function, so that the
>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>
> In the current implementation for x86, when we generating a zeroing insn as 
> the following:
>
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>         (const_int 0 [0])) "t10.c":11:1 -1
>      (nil))
> (insn 19 18 20 2 (unspec_volatile [
>             (reg:SI 1 dx)
>         ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>      (nil))
>
> i.e, after each zeroing insn, the register that is zeroed is marked as 
> “UNSPECV_PRO_EPILOGUE_USE”, 
> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>
> Is doing this enough to describe the result correctly?
> Is there other thing we need to do in addition to this?


I guess that works, but I think it would be better to abstract
EPILOGUE_USES into a new target-independent wrapper function that
(a) returns true if EPILOGUE_USES itself returns true and (b) returns
true for registers that need to be zero on return, if the zeroing
instructions have already been inserted.  The places that currently
test EPILOGUE_USES should then test this new wrapper function instead.

After inserting the zeroing instructions, the pass should recompute the
live-out sets based on this.

>>>> But the dataflow information has to be correct between
>>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>>> any pass in that region could clobber the registers in the same way.
>>> 
>>> You mean, the data flow information will be not correct after 
>>> pass_free_cfg? 
>>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new 
>>> “return” generated in “pass_delay_slots”, 
>>> If we want to generate zeroing for the new “return” which was generated in 
>>> “pass_delay_slots”, can we correctly to do so?
>> 
>> …the zeroing has to be done before pass_free_cfg, because the information
>> isn't reliable after that point.  I think it would make sense to do it
>> before pass_compute_alignments, because inserting the zeros will affect
>> alignment.
>
> Okay. 
>
> Then there is another problem:  what about the new “return”s that are 
> generated at pass_delay_slots?
>
> Should we generate the zeroing for these new returns? Since the data flow 
> information might not be correct at this pass,
> It looks like that there is no correct way to add the zeroing insn for these 
> new “return”, then, what should we do about this?

pass_delay_slots isn't a problem.  It doesn't change *what* happens
on each return path, it just changes how the instructions to achieve
it are arranged.

So e.g. if every path through the function clears register R before
pass_delay_slots, and if that clearing is represented as being necessary,
then every path through the function will clear register R after the pass
as well.

>> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
>> to see whether there's a target-specific reason why a register can't
>> be clobbered.
>
> /* Return true if is OK to use a hard register REGNO as scratch register
>    in peephole2.  */
> DEFHOOK
> (hard_regno_scratch_ok,
>
>
> Is this checking only valid for pass_peephole2?

No, that comment looks out of date.  The hook is already used in
postreload, for example.

Thanks,
Richard

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Reply via email to