On 05/09/16 17:43, Bernd Edlinger wrote:
> Hi Richard,
> 
> what do you think of this patch, is it OK (with the suggested wording)?
> 

Bernd,

Apologies, this seems to have fallen through a crack.

I'm happy with this.  Does it still apply?

If so, I suggest applying it after a 24-hour cooling off period for any
final comments.

R.

> 
> Thanks
> Bernd.
> 
> On 08/05/16 16:06, Richard Earnshaw (lists) wrote:
>> On 05/08/16 13:49, Bernd Edlinger wrote:
>>> On 08/05/16 11:29, Richard Earnshaw (lists) wrote:
>>>> On 04/08/16 22:16, Bernd Edlinger wrote:
>>>>> Hi,
>>>>>
>>>>> this patch introduces a new target hook that allows the target's
>>>>> INITIAL_ELIMINATION_OFFSET function to use cached values instead of
>>>>> re-computing the frame layout every time.
>>>>>
>>>>> I have updated the documentation a bit and hope it is clearer this time.
>>>>>
>>>>> It still needs a review by ARM port maintainers.
>>>>>
>>>>> If the ARM port maintainers find this patch useful, that would be fine.
>>>>>
>>>>
>>>> I need to look into this more, but my first thought was that the
>>>> documentation is confusing, or there is a real problem in this patch.
>>>>
>>>
>>> Thanks for your quick response.
>>>
>>> The documentation is actually the most difficult part for me.
>>>
>>>> As I understand it the frame has to be re-laid out each time the
>>>> contents of the frame changes (an extra register becomes live or another
>>>> spill slot is needed).  So saying that it is laid out once can't be
>>>> right if (as I read it initially) you mean 'once per function' since I
>>>> think it needs to be 'once for each time the frame contents changes'.
>>>>
>>>> Of course, things might be a bit different with LRA compared with
>>>> reload, but I strongly suspect this is never going to be an 'exactly
>>>> once per function' operation.
>>>>
>>>
>>> Right.  It will be done 2 or 3 times for each function.
>>> LRA and reload behave identical in that respect.
>>>
>>> But each time reload changes something in the input data the
>>> INITIAL_EMIMINATION_OFFSET is called several times, and the results
>>> have to be consistent in each iteration.
>>>
>>> The frame layout function has no way to know if the frame layout
>>> is expected to change or not.
>>>
>>> Many targets use reload_completed as an indication when the frame layout
>>> may not change at all, but that is only an approximation.
>>>
>>>> Can you clarify your meaning in the documentation please?
>>>>
>>>
>>> I meant 'once' in the sense of 'once for each time the frame contents
>>> change'.
>>>
>>> Thus I'd change that sentence to:
>>>
>>> "This target hook allows the target to compute the frame layout once for
>>> each time the frame contents change and make use of the cached frame
>>> layout in @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing it
>>> on every invocation.  This is particularly useful for targets that have
>>> an expensive frame layout function.  Implementing this callback is
>>> optional."
>>>
>>
>> Thanks, that's pretty much what I expected would be the case.
>>
>> Could I suggest:
>>
>> This target hook is called once each time the frame layout needs to be
>> recalculated.  The calculations can be cached by the target and can then
>> be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the
>> layout on every invocation of that hook.  This is particularly useful
>> for targets that have an expensive frame layout function.  Implementing
>> this callback is optional.
>>
>> R.
>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>>
>>>> R.
>>>>
>>>>>
>>>>> Thanks
>>>>> Bernd.
>>>>>
>>>>> On 06/21/16 23:29, Jeff Law wrote:
>>>>>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>>
>>>>>>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>>>>>>> it is necessary to call this function several times with
>>>>>>> different register combinations.
>>>>>>> Most targets use a cached data structure that describes the
>>>>>>> exact frame layout of the current function.
>>>>>>>
>>>>>>> It is safe to skip the computation when reload_completed = true,
>>>>>>> and most targets do that already.
>>>>>>>
>>>>>>> However while reload is doing its work, it is not clear when to
>>>>>>> do the computation and when not.  This results in unnecessary
>>>>>>> work.  Computing the frame layout can be a simple function or an
>>>>>>> arbitrarily complex one, that walks all instructions of the current
>>>>>>> function for instance, which is more or less the common case.
>>>>>>>
>>>>>>>
>>>>>>> This patch adds a new optional target hook that can be used
>>>>>>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>>>>>>> into a O(n) computation part, and a O(1) result function.
>>>>>>>
>>>>>>> The patch implements a compute_frame_layout target hook just
>>>>>>> for ARM in the moment, to show the principle.
>>>>>>> Other targets may also implement that hook, if it seems appropriate.
>>>>>>>
>>>>>>>
>>>>>>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>>>>>>> OK for trunk?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Bernd.
>>>>>>>
>>>>>>>
>>>>>>> changelog-frame-layout.txt
>>>>>>>
>>>>>>>
>>>>>>> 2016-06-16  Bernd Edlinger  <bernd.edlin...@hotmail.de>
>>>>>>>
>>>>>>>       * target.def (compute_frame_layout): New optional target hook.
>>>>>>>       * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>>>>       * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>>>>       * lra-eliminations.c (update_reg_eliminate): Call
>>>>>>> compute_frame_layout
>>>>>>>       target hook.
>>>>>>>       * reload1.c (verify_initial_elim_offsets): Likewise.
>>>>>>>       * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>>>>       (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>>>>       (arm_get_frame_offsets): Split up into this ...
>>>>>>>       (arm_compute_frame_layout): ... and this function.
>>>>>> The ARM maintainers would need to chime in on the ARM specific changes
>>>>>> though.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Index: gcc/target.def
>>>>>>> ===================================================================
>>>>>>> --- gcc/target.def    (Revision 233176)
>>>>>>> +++ gcc/target.def    (Arbeitskopie)
>>>>>>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most 
>>>>>>> machines.",
>>>>>>>     unsigned int, (void),
>>>>>>>     default_case_values_threshold)
>>>>>>>
>>>>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>>>>> s/Retutn/Return
>>>>>>
>>>>>>> +/* Optional callback to advise the target to compute the frame
>>>>>>> layout.  */
>>>>>>>    DEFHOOK
>>>>>>> +(compute_frame_layout,
>>>>>>> + "This target hook is called immediately before reload wants to call\n\
>>>>>>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>>>>>>> frame\n\
>>>>>>> +layout instead of re-computing it on every invocation.  This is
>>>>>>> particularly\n\
>>>>>>> +useful for targets that have an O(n) frame layout function.
>>>>>>> Implementing\n\
>>>>>>> +this callback is optional.",
>>>>>>> + void, (void),
>>>>>>> + hook_void_void)
>>>>>> So the docs say "immediately before", but that's not actually reality in
>>>>>> lra-eliminations.  I think you can just say "This target hook is called
>>>>>> before reload or lra-eliminations calls
>>>>>> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>>>>>>
>>>>>>
>>>>>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>>>>>>
>>>>>> I'm OK with this conceptually.  I think you need a minor doc update and
>>>>>> OK from the ARM maintainers before it can be installed though.
>>>>>>
>>>>>> jeff
>>>>>>
>>>>>> changelog-frame-layout.txt
>>>>>>
>>>>>>
>>>>>> 2016-06-16  Bernd Edlinger  <bernd.edlin...@hotmail.de>
>>>>>>
>>>>>>  * target.def (compute_frame_layout): New optional target hook.
>>>>>>  * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>>>  * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>>>  * lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
>>>>>>  target hook.
>>>>>>  * reload1.c (verify_initial_elim_offsets): Likewise.
>>>>>>  * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>>>  (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>>>  (arm_get_frame_offsets): Split up into this ...
>>>>>>  (arm_compute_frame_layout): ... and this function.
>>>>>>
>>>>>> patch-frame-layout.diff
>>>>>>
>>>>>>
>>>>>> Index: gcc/config/arm/arm.c
>>>>>> ===================================================================
>>>>>> --- gcc/config/arm/arm.c (revision 239144)
>>>>>> +++ gcc/config/arm/arm.c (working copy)
>>>>>> @@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
>>>>>>    static bool arm_needs_doubleword_align (machine_mode, const_tree);
>>>>>>    static int arm_compute_static_chain_stack_bytes (void);
>>>>>>    static arm_stack_offsets *arm_get_frame_offsets (void);
>>>>>> +static void arm_compute_frame_layout (void);
>>>>>>    static void arm_add_gc_roots (void);
>>>>>>    static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
>>>>>>                               unsigned HOST_WIDE_INT, rtx, rtx, int, 
>>>>>> int);
>>>>>> @@ -663,6 +664,9 @@ static const struct attribute_spec arm_attribute_t
>>>>>>    #undef TARGET_SCALAR_MODE_SUPPORTED_P
>>>>>>    #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
>>>>>>
>>>>>> +#undef TARGET_COMPUTE_FRAME_LAYOUT
>>>>>> +#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
>>>>>> +
>>>>>>    #undef TARGET_FRAME_POINTER_REQUIRED
>>>>>>    #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
>>>>>>
>>>>>> @@ -3880,6 +3884,10 @@ use_simple_return_p (void)
>>>>>>    {
>>>>>>      arm_stack_offsets *offsets;
>>>>>>
>>>>>> +  /* Note this function can be called before or after reload.  */
>>>>>> +  if (!reload_completed)
>>>>>> +    arm_compute_frame_layout ();
>>>>>> +
>>>>>>      offsets = arm_get_frame_offsets ();
>>>>>>      return offsets->outgoing_args != 0;
>>>>>>    }
>>>>>> @@ -19370,7 +19378,7 @@ arm_compute_static_chain_stack_bytes (void)
>>>>>>
>>>>>>    /* Compute a bit mask of which registers need to be
>>>>>>       saved on the stack for the current function.
>>>>>> -   This is used by arm_get_frame_offsets, which may add extra 
>>>>>> registers.  */
>>>>>> +   This is used by arm_compute_frame_layout, which may add extra 
>>>>>> registers.  */
>>>>>>
>>>>>>    static unsigned long
>>>>>>    arm_compute_save_reg_mask (void)
>>>>>> @@ -20926,12 +20934,25 @@ any_sibcall_could_use_r3 (void)
>>>>>>      alignment.  */
>>>>>>
>>>>>>
>>>>>> +/* Return cached stack offsets.  */
>>>>>> +
>>>>>> +static arm_stack_offsets *
>>>>>> +arm_get_frame_offsets (void)
>>>>>> +{
>>>>>> +  struct arm_stack_offsets *offsets;
>>>>>> +
>>>>>> +  offsets = &cfun->machine->stack_offsets;
>>>>>> +
>>>>>> +  return offsets;
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>>    /* Calculate stack offsets.  These are used to calculate register 
>>>>>> elimination
>>>>>>       offsets and in prologue/epilogue code.  Also calculates which 
>>>>>> registers
>>>>>>       should be saved.  */
>>>>>>
>>>>>> -static arm_stack_offsets *
>>>>>> -arm_get_frame_offsets (void)
>>>>>> +static void
>>>>>> +arm_compute_frame_layout (void)
>>>>>>    {
>>>>>>      struct arm_stack_offsets *offsets;
>>>>>>      unsigned long func_type;
>>>>>> @@ -20943,19 +20964,6 @@ any_sibcall_could_use_r3 (void)
>>>>>>
>>>>>>      offsets = &cfun->machine->stack_offsets;
>>>>>>
>>>>>> -  /* We need to know if we are a leaf function.  Unfortunately, it
>>>>>> -     is possible to be called after start_sequence has been called,
>>>>>> -     which causes get_insns to return the insns for the sequence,
>>>>>> -     not the function, which will cause leaf_function_p to return
>>>>>> -     the incorrect result.
>>>>>> -
>>>>>> -     to know about leaf functions once reload has completed, and the
>>>>>> -     frame size cannot be changed after that time, so we can safely
>>>>>> -     use the cached value.  */
>>>>>> -
>>>>>> -  if (reload_completed)
>>>>>> -    return offsets;
>>>>>> -
>>>>>>      /* Initially this is the size of the local variables.  It will 
>>>>>> translated
>>>>>>         into an offset once we have determined the size of preceding 
>>>>>> data.  */
>>>>>>      frame_size = ROUND_UP_WORD (get_frame_size ());
>>>>>> @@ -21022,7 +21030,7 @@ any_sibcall_could_use_r3 (void)
>>>>>>        {
>>>>>>          offsets->outgoing_args = offsets->soft_frame;
>>>>>>          offsets->locals_base = offsets->soft_frame;
>>>>>> -      return offsets;
>>>>>> +      return;
>>>>>>        }
>>>>>>
>>>>>>      /* Ensure SFP has the correct alignment.  */
>>>>>> @@ -21098,8 +21106,6 @@ any_sibcall_could_use_r3 (void)
>>>>>>          offsets->outgoing_args += 4;
>>>>>>          gcc_assert (!(offsets->outgoing_args & 7));
>>>>>>        }
>>>>>> -
>>>>>> -  return offsets;
>>>>>>    }
>>>>>>
>>>>>>
>>>>>> Index: gcc/doc/tm.texi
>>>>>> ===================================================================
>>>>>> --- gcc/doc/tm.texi      (revision 239144)
>>>>>> +++ gcc/doc/tm.texi      (working copy)
>>>>>> @@ -3693,6 +3693,14 @@ registers.  This macro must be defined if @code{EL
>>>>>>    defined.
>>>>>>    @end defmac
>>>>>>
>>>>>> +@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
>>>>>> +This target hook allows the target to compute the frame layout once and
>>>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}
>>>>>> +instead of re-computing it on every invocation.  This is particularly
>>>>>> +useful for targets that have an expensive frame layout function.
>>>>>> +Implementing this callback is optional.
>>>>>> +@end deftypefn
>>>>>> +
>>>>>>    @node Stack Arguments
>>>>>>    @subsection Passing Function Arguments on the Stack
>>>>>>    @cindex arguments on stack
>>>>>> Index: gcc/doc/tm.texi.in
>>>>>> ===================================================================
>>>>>> --- gcc/doc/tm.texi.in   (revision 239144)
>>>>>> +++ gcc/doc/tm.texi.in   (working copy)
>>>>>> @@ -3227,6 +3227,8 @@ registers.  This macro must be defined if @code{EL
>>>>>>    defined.
>>>>>>    @end defmac
>>>>>>
>>>>>> +@hook TARGET_COMPUTE_FRAME_LAYOUT
>>>>>> +
>>>>>>    @node Stack Arguments
>>>>>>    @subsection Passing Function Arguments on the Stack
>>>>>>    @cindex arguments on stack
>>>>>> Index: gcc/lra-eliminations.c
>>>>>> ===================================================================
>>>>>> --- gcc/lra-eliminations.c       (revision 239144)
>>>>>> +++ gcc/lra-eliminations.c       (working copy)
>>>>>> @@ -1203,6 +1203,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
>>>>>>      struct lra_elim_table *ep, *ep1;
>>>>>>      HARD_REG_SET temp_hard_reg_set;
>>>>>>
>>>>>> +#ifdef ELIMINABLE_REGS
>>>>>> +  targetm.compute_frame_layout ();
>>>>>> +#endif
>>>>>> +
>>>>>>      /* Clear self elimination offsets.  */
>>>>>>      for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; 
>>>>>> ep++)
>>>>>>        self_elim_offsets[ep->from] = 0;
>>>>>> Index: gcc/reload1.c
>>>>>> ===================================================================
>>>>>> --- gcc/reload1.c        (revision 239144)
>>>>>> +++ gcc/reload1.c        (working copy)
>>>>>> @@ -3831,6 +3831,7 @@ verify_initial_elim_offsets (void)
>>>>>>      {
>>>>>>       struct elim_table *ep;
>>>>>>
>>>>>> +   targetm.compute_frame_layout ();
>>>>>>       for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; 
>>>>>> ep++)
>>>>>>         {
>>>>>>           INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
>>>>>> @@ -3855,6 +3856,7 @@ set_initial_elim_offsets (void)
>>>>>>      struct elim_table *ep = reg_eliminate;
>>>>>>
>>>>>>    #ifdef ELIMINABLE_REGS
>>>>>> +  targetm.compute_frame_layout ();
>>>>>>      for (; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>>        {
>>>>>>          INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, 
>>>>>> ep->initial_offset);
>>>>>> Index: gcc/target.def
>>>>>> ===================================================================
>>>>>> --- gcc/target.def       (revision 239144)
>>>>>> +++ gcc/target.def       (working copy)
>>>>>> @@ -5269,8 +5269,19 @@ five otherwise.  This is best for most machines.",
>>>>>>     unsigned int, (void),
>>>>>>     default_case_values_threshold)
>>>>>>
>>>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>>>>> +/* Optional callback to advise the target to compute the frame layout.  
>>>>>> */
>>>>>>    DEFHOOK
>>>>>> +(compute_frame_layout,
>>>>>> + "This target hook allows the target to compute the frame layout once 
>>>>>> and\n\
>>>>>> +make use of the cached frame layout in 
>>>>>> @code{INITIAL_ELIMINATION_OFFSET}\n\
>>>>>> +instead of re-computing it on every invocation.  This is particularly\n\
>>>>>> +useful for targets that have an expensive frame layout function.\n\
>>>>>> +Implementing this callback is optional.",
>>>>>> + void, (void),
>>>>>> + hook_void_void)
>>>>>> +
>>>>>> +/* Return true if a function must have and use a frame pointer.  */
>>>>>> +DEFHOOK
>>>>>>    (frame_pointer_required,
>>>>>>     "This target hook should return @code{true} if a function must have 
>>>>>> and use\n\
>>>>>>    a frame pointer.  This target hook is called in the reload pass.  If 
>>>>>> its return\n\
>>>>
>>>
>>

Reply via email to