On Tue, May 8, 2018 at 5:56 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Tue, May 8, 2018 at 3:25 PM, Richard Sandiford >> <richard.sandif...@linaro.org> wrote: >>> We build up the input to IFN_STORE_LANES one vector at a time. >>> In RTL, each of these vector assignments becomes a write to >>> subregs of the form (subreg:VEC (reg:AGGR R)), where R is the >>> eventual input to the store lanes instruction. The problem is >>> that RTL isn't very good at tracking liveness when things are >>> initialised piecemeal by subregs, so R tends to end up being >>> live on all paths from the entry block to the store. This in >>> turn leads to unnecessary spilling around calls, as well as to >>> excess register pressure in vector loops. >>> >>> This patch adds gimple clobbers to indicate the liveness of the >>> IFN_STORE_LANES variable and makes sure that gimple clobbers are >>> expanded to rtl clobbers where useful. For consistency it also >>> uses clobbers to mark the point at which an IFN_LOAD_LANES >>> variable is no longer needed. >>> >>> Tested on aarch64-linux-gnu (with and without SVE), aaarch64_be-elf >>> and x86_64-linux-gnu. OK to install? >> >> Minor comment inline. > > Thanks, fixed. > >> Also it looks like clobbers are at the moment all thrown away during >> RTL expansion? Do the clobbers we generate with this patch eventually >> get collected somehow if they turn out to be no longer necessary? >> How many of them do we generate? I expect not many decls get >> expanded to registers and if they are most of them are likely >> not constructed piecemail - thus, maybe we should restrict ourselves >> to non-scalar typed lhs? So, change it to >> >> if (DECL_P (lhs) >> && (AGGREGATE_TYPE_P (TREE_TYPE (lhs)) // but what about >> single-element aggregates? >> || VECTOR_TYPE_P (TREE_TYPE (lhs)) >> || COMPLEX_TYPE_P (TREE_TYPE (lhs)))) > > How about instead deciding based on whether the pseudo register spans a > single hard register or multiple hard registers, as per the patch below? > The clobber is only useful if the pseudo register can be partially > modified via subregs. > > This could potentially also help with any large single-element > aggregrates that get broken down into word-sized subreg ops.
Yeah, that looks even better. >> The vectorizer part is ok with the minor adjustment pointed out below. Maybe >> you want to split this patch while we discuss the RTL bits. > > OK, thanks. I'll keep it as one patch for applying purposes, > but snipped the approved bits below. This version is ok. Thanks, Richard. > Richard > > > 2018-05-08 Richard Sandiford <richard.sandif...@linaro.org> > > gcc/ > * cfgexpand.c (expand_clobber): New function. > (expand_gimple_stmt_1): Use it. > > Index: gcc/cfgexpand.c > =================================================================== > --- gcc/cfgexpand.c 2018-05-08 16:50:31.815501502 +0100 > +++ gcc/cfgexpand.c 2018-05-08 16:50:31.997495804 +0100 > @@ -3582,6 +3582,26 @@ expand_return (tree retval, tree bounds) > } > } > > +/* Expand a clobber of LHS. If LHS is stored it in a multi-part > + register, tell the rtl optimizers that its value is no longer > + needed. */ > + > +static void > +expand_clobber (tree lhs) > +{ > + if (DECL_P (lhs)) > + { > + rtx decl_rtl = DECL_RTL_IF_SET (lhs); > + if (decl_rtl && REG_P (decl_rtl)) > + { > + machine_mode decl_mode = GET_MODE (decl_rtl); > + if (maybe_gt (GET_MODE_SIZE (decl_mode), > + REGMODE_NATURAL_SIZE (decl_mode))) > + emit_clobber (decl_rtl); > + } > + } > +} > + > /* A subroutine of expand_gimple_stmt, expanding one gimple statement > STMT that doesn't require special handling for outgoing edges. That > is no tailcalls and no GIMPLE_COND. */ > @@ -3687,7 +3707,7 @@ expand_gimple_stmt_1 (gimple *stmt) > if (TREE_CLOBBER_P (rhs)) > /* This is a clobber to mark the going out of scope for > this LHS. */ > - ; > + expand_clobber (lhs); > else > expand_assignment (lhs, rhs, > gimple_assign_nontemporal_move_p (