On Tue, May 8, 2018 at 5:56 PM, Richard Sandiford
<richard.sandif...@linaro.org> wrote:
> Richard Biener <richard.guent...@gmail.com> writes:
>> On Tue, May 8, 2018 at 3:25 PM, Richard Sandiford
>> <richard.sandif...@linaro.org> wrote:
>>> We build up the input to IFN_STORE_LANES one vector at a time.
>>> In RTL, each of these vector assignments becomes a write to
>>> subregs of the form (subreg:VEC (reg:AGGR R)), where R is the
>>> eventual input to the store lanes instruction.  The problem is
>>> that RTL isn't very good at tracking liveness when things are
>>> initialised piecemeal by subregs, so R tends to end up being
>>> live on all paths from the entry block to the store.  This in
>>> turn leads to unnecessary spilling around calls, as well as to
>>> excess register pressure in vector loops.
>>>
>>> This patch adds gimple clobbers to indicate the liveness of the
>>> IFN_STORE_LANES variable and makes sure that gimple clobbers are
>>> expanded to rtl clobbers where useful.  For consistency it also
>>> uses clobbers to mark the point at which an IFN_LOAD_LANES
>>> variable is no longer needed.
>>>
>>> Tested on aarch64-linux-gnu (with and without SVE), aaarch64_be-elf
>>> and x86_64-linux-gnu.  OK to install?
>>
>> Minor comment inline.
>
> Thanks, fixed.
>
>> Also it looks like clobbers are at the moment all thrown away during
>> RTL expansion?  Do the clobbers we generate with this patch eventually
>> get collected somehow if they turn out to be no longer necessary?
>> How many of them do we generate?  I expect not many decls get
>> expanded to registers and if they are most of them are likely
>> not constructed piecemail  - thus, maybe we should restrict ourselves
>> to non-scalar typed lhs?  So, change it to
>>
>>   if (DECL_P (lhs)
>>       && (AGGREGATE_TYPE_P (TREE_TYPE (lhs)) // but what about
>> single-element aggregates?
>>              || VECTOR_TYPE_P (TREE_TYPE (lhs))
>>              || COMPLEX_TYPE_P (TREE_TYPE (lhs))))
>
> How about instead deciding based on whether the pseudo register spans a
> single hard register or multiple hard registers, as per the patch below?
> The clobber is only useful if the pseudo register can be partially
> modified via subregs.
>
> This could potentially also help with any large single-element
> aggregrates that get broken down into word-sized subreg ops.

Yeah, that looks even better.

>> The vectorizer part is ok with the minor adjustment pointed out below.  Maybe
>> you want to split this patch while we discuss the RTL bits.
>
> OK, thanks.  I'll keep it as one patch for applying purposes,
> but snipped the approved bits below.

This version is ok.

Thanks,
Richard.

> Richard
>
>
> 2018-05-08  Richard Sandiford  <richard.sandif...@linaro.org>
>
> gcc/
>         * cfgexpand.c (expand_clobber): New function.
>         (expand_gimple_stmt_1): Use it.
>
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2018-05-08 16:50:31.815501502 +0100
> +++ gcc/cfgexpand.c     2018-05-08 16:50:31.997495804 +0100
> @@ -3582,6 +3582,26 @@ expand_return (tree retval, tree bounds)
>      }
>  }
>
> +/* Expand a clobber of LHS.  If LHS is stored it in a multi-part
> +   register, tell the rtl optimizers that its value is no longer
> +   needed.  */
> +
> +static void
> +expand_clobber (tree lhs)
> +{
> +  if (DECL_P (lhs))
> +    {
> +      rtx decl_rtl = DECL_RTL_IF_SET (lhs);
> +      if (decl_rtl && REG_P (decl_rtl))
> +       {
> +         machine_mode decl_mode = GET_MODE (decl_rtl);
> +         if (maybe_gt (GET_MODE_SIZE (decl_mode),
> +                       REGMODE_NATURAL_SIZE (decl_mode)))
> +           emit_clobber (decl_rtl);
> +       }
> +    }
> +}
> +
>  /* A subroutine of expand_gimple_stmt, expanding one gimple statement
>     STMT that doesn't require special handling for outgoing edges.  That
>     is no tailcalls and no GIMPLE_COND.  */
> @@ -3687,7 +3707,7 @@ expand_gimple_stmt_1 (gimple *stmt)
>             if (TREE_CLOBBER_P (rhs))
>               /* This is a clobber to mark the going out of scope for
>                  this LHS.  */
> -             ;
> +             expand_clobber (lhs);
>             else
>               expand_assignment (lhs, rhs,
>                                  gimple_assign_nontemporal_move_p (

Reply via email to