Ping: [PATCH V2] introduce light expander sra

Jiufu Guo Fri, 17 Nov 2023 00:21:44 -0800

Hi,


I would like to have a ping for this patch.

There are some aspects(TODOs) that can be improved for this patch.
I had some statistics to see if these aspects occur often, by
checking the gcc source code(including test suite) and spec2017.
- Reverse storage order. This only occurs in very few tests that
  are using the attribute.
- Writing to parameters. In this kind of case only ~12 hits in the
  gcc source code. (some hits in one Go file.)
- Overlapping access to parameters. For overlapping reading, it
  is supported already. For writing overlapping reading, it is
  not very common, because writing to parameter is rare.
- Bitfields extracting from parameter mult-registers. This occurs
  twice only in gcc code (and hit in one 'go' file:h2_bundle.go),
  6 files in the test suite, and 1 hit in spec2017.

I'm thinking of enhancing the patch incrementally.
Thanks for the comments in advance.

BR,
Jeff (Jiufu Guo)

Jiufu Guo <guoji...@linux.ibm.com> writes:

> Hi,
>
> Compare with previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632399.html
> This verion supports TI/VEC mode of the access.
>
> There are a few PRs (meta-bug PR101926) on various targets.
> The root causes of them are similar: the aggeragte param/
> returns are passed by multi-registers, but they are stored
> to stack from registers first; and then, access the 
> parameter through stack slot.
>
> A general idea to enhance this: accessing the aggregate
> parameters/returns directly through incoming/outgoing
> scalar registers.  This idea would be a kind of SRA.
>
> This experimental patch for light-expander-sra contains
> below parts:
>
> a. Check if the parameters/returns are ok/profitable to
>    scalarize, and set the incoming/outgoing registers(
>    pseudos) for the parameter/return.
>   - This is done in "expand_function_start", after the
>     incoming/outgoing hard registers are determined for the
>     paramter/return.
>     The scalarized registers are recorded in DECL_RTL for
>     the parameter/return in parallel form.
>   - At the time when setting DECL_RTL, "scalarizable_aggregate"
>     is called to check the accesses are ok/profitable to
>     scalarize.
>     We can continue to enhance this function, to support
>     more cases.  For example:
>     - 'reverse storage order'.
>     - 'writing to parameter'/'overlap accesses'.
>
> b. When expanding the accesses of the parameters/returns,
>    according to the info of the access(e.g. bitpos,bitsize,
>    mode), the scalar(pseudos) can be figured out to expand
>    the access.  This may happen when expand below accesses:
>   - The component access of a parameter: "_1 = arg.f1".
>     Or whole parameter access: rhs of "_2 = arg"
>   - The assignment to a return val:
>     "D.xx = yy; or D.xx.f = zz" where D.xx occurs on return
>     stmt.
>   - This is mainly done in expr.cc(expand_expr_real_1, and
>     expand_assignment).  Function "extract_sub_member" is
>     used to figure out the scalar rtxs(pseudos).
>
> Besides the above two parts, some work are done in the GIMPLE
> tree:  collect sra candidates for parameters/returns, and
> collect the SRA access info.
> This is mainly done at the beginning of the expander pass.
> Below are two major items of this part.
>  - Collect light-expand-sra candidates.
>   Each parameter is checked if it has the proper aggregate
>   type.  Collect return val (VAR_P) on each return stmts if
>   the function is returning via registers.  
>   This is implemented in expand_sra::collect_sra_candidates. 
>
>  - Build/collect/manage all the access on the candidates.
>   The function "scan_function" is used to do this work, it
>   goes through all basicblocks, and all interesting stmts (
>   phi, return, assign, call, asm) are checked.
>   If there is an interesting expression (e.g. COMPONENT_REF
>   or PARM_DECL), then record the required info for the access
>   (e.g. pos, size, type, base).
>   And if it is risky to do SRA, the candidates may be removed.
>   e.g. address-taken and accessed via memory.
>   "foo(struct S arg) {bar (&arg);}"
>
> This patch is tested on ppc64{,le} and x86_64.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
>       PR target/65421
>
> gcc/ChangeLog:
>
>       * cfgexpand.cc (struct access): New class.
>       (struct expand_sra): New class.
>       (expand_sra::collect_sra_candidates): New member function.
>       (expand_sra::add_sra_candidate): Likewise.
>       (expand_sra::build_access): Likewise.
>       (expand_sra::analyze_phi): Likewise.
>       (expand_sra::analyze_assign): Likewise.
>       (expand_sra::visit_base): Likewise.
>       (expand_sra::protect_mem_access_in_stmt): Likewise.
>       (expand_sra::expand_sra):  Class constructor.
>       (expand_sra::~expand_sra): Class destructor.
>       (expand_sra::scalarizable_access):  New member function.
>       (expand_sra::scalarizable_accesses):  Likewise.
>       (scalarizable_aggregate):  New function.
>       (set_scalar_rtx_for_returns):  New function.
>       (expand_value_return): Updated.
>       (expand_debug_expr): Updated.
>       (pass_expand::execute): Updated to use expand_sra.
>       * cfgexpand.h (scalarizable_aggregate): New declare.
>       (set_scalar_rtx_for_returns): New declare.
>       * expr.cc (expand_assignment): Updated.
>       (expand_constructor): Updated.
>       (query_position_in_parallel): New function.
>       (extract_sub_member): New function.
>       (expand_expr_real_1): Updated.
>       * expr.h (query_position_in_parallel): New declare.
>       * function.cc (assign_parm_setup_block): Updated.
>       (assign_parms): Updated.
>       (expand_function_start): Updated.
>       * tree-sra.h (struct sra_base_access): New class.
>       (struct sra_default_analyzer): New class.
>       (scan_function): New template function.
>       * var-tracking.cc (track_loc_p): Updated.
>
> gcc/testsuite/ChangeLog:
>
>       * g++.target/powerpc/pr102024.C: Updated
>       * gcc.target/powerpc/pr108073.c: New test.
>       * gcc.target/powerpc/pr65421-1.c: New test.
>       * gcc.target/powerpc/pr65421-2.c: New test.
>
> ---
>  gcc/cfgexpand.cc                             | 352 ++++++++++++++++++-
>  gcc/cfgexpand.h                              |   2 +
>  gcc/expr.cc                                  | 179 +++++++++-
>  gcc/expr.h                                   |   3 +
>  gcc/function.cc                              |  36 +-
>  gcc/tree-sra.h                               |  76 ++++
>  gcc/var-tracking.cc                          |   3 +-
>  gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
>  gcc/testsuite/gcc.target/i386/pr20020-2.c    |   5 +
>  gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 ++
>  gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
>  gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
>  12 files changed, 718 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 4262703a138..ef99ca8ac13 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "output.h"
>  #include "builtins.h"
>  #include "opts.h"
> +#include "tree-sra.h"
>  
>  /* Some systems use __main in a way incompatible with its use in gcc, in 
> these
>     cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN 
> to
> @@ -97,6 +98,343 @@ static bool defer_stack_allocation (tree, bool);
>  
>  static void record_alignment_for_reg_var (unsigned int);
>  
> +/* For light SRA in expander about paramaters and returns.  */
> +struct access : public sra_base_access
> +{
> +};
> +
> +typedef struct access *access_p;
> +
> +struct expand_sra : public sra_default_analyzer
> +{
> +  /* Construct/destruct resources, e.g. sra candidates.  */
> +  expand_sra ();
> +  ~expand_sra ();
> +
> +  /* No actions for pre_analyze_stmt, analyze_return.  */
> +
> +  /* Overwrite phi,call,asm analyzations.  */
> +  void analyze_phi (gphi *s);
> +
> +  /* TODO: Check accesses on call/asm.  */
> +  void analyze_call (gcall *s) { protect_mem_access_in_stmt (s); };
> +  void analyze_asm (gasm *s) { protect_mem_access_in_stmt (s); };
> +
> +  /* Check access of SRA on assignment.  */
> +  void analyze_assign (gassign *);
> +
> +  /* Check if the accesses of BASE(parameter or return) are
> +     scalarizable, according to the incoming/outgoing REGS. */
> +  bool scalarizable_accesses (tree base, rtx regs);
> +
> +private:
> +  /* Collect the parameter and returns to check if they are suitable for
> +     scalarization.  */
> +  bool collect_sra_candidates (void);
> +
> +  /* Return true if VAR is added as a candidate for SRA.  */
> +  bool add_sra_candidate (tree var);
> +
> +  /* Return true if EXPR has interesting sra access, and created access,
> +     return false otherwise.  */
> +  access_p build_access (tree expr, bool write);
> +
> +  /* Check if the access ACC is scalarizable.  REGS is the incoming/outgoing
> +     registers which the access is based on. */
> +  bool scalarizable_access (access_p acc, rtx regs, bool is_parm);
> +
> +  /* If there is risk (stored/loaded or addr taken),
> +     disqualify the sra candidates in the un-interesting STMT. */
> +  void protect_mem_access_in_stmt (gimple *stmt);
> +
> +  /* Callback of walk_stmt_load_store_addr_ops, used to remove
> +     unscalarizable accesses.  */
> +  static bool visit_base (gimple *, tree op, tree, void *data);
> +
> +  /* Base (tree) -> Vector (vec<access_p> *) map.  */
> +  hash_map<tree, auto_vec<access_p> > *base_access_vec;
> +};
> +
> +bool
> +expand_sra::collect_sra_candidates (void)
> +{
> +  bool ret = false;
> +
> +  /* Collect parameters.  */
> +  for (tree parm = DECL_ARGUMENTS (current_function_decl); parm;
> +       parm = DECL_CHAIN (parm))
> +    ret |= add_sra_candidate (parm);
> +
> +  /* Collect VARs on returns.  */
> +  if (DECL_RESULT (current_function_decl))
> +    {
> +      edge_iterator ei;
> +      edge e;
> +      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +     if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
> +       {
> +         tree val = gimple_return_retval (r);
> +         /* To sclaraized the return, the return value should be only
> +            writen (except this return stmt).  Using 'true(write)' to
> +            pretend the access only be 'writen'. */
> +         if (val && VAR_P (val))
> +           ret |= add_sra_candidate (val) && build_access (val, true);
> +       }
> +    }
> +
> +  return ret;
> +}
> +
> +bool
> +expand_sra::add_sra_candidate (tree var)
> +{
> +  tree type = TREE_TYPE (var);
> +
> +  if (!AGGREGATE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type))
> +      || tree_to_shwi (TYPE_SIZE (type)) == 0 || TREE_THIS_VOLATILE (var)
> +      || is_va_list_type (type))
> +    return false;
> +
> +  base_access_vec->get_or_insert (var);
> +
> +  return true;
> +}
> +
> +access_p
> +expand_sra::build_access (tree expr, bool write)
> +{
> +  enum tree_code code = TREE_CODE (expr);
> +  if (code != VAR_DECL && code != PARM_DECL && code != COMPONENT_REF
> +      && code != ARRAY_REF && code != ARRAY_RANGE_REF)
> +    return NULL;
> +
> +  HOST_WIDE_INT offset, size;
> +  bool reverse;
> +  tree base = get_ref_base_and_extent_hwi (expr, &offset, &size, &reverse);
> +  if (!base || !DECL_P (base))
> +    return NULL;
> +
> +  vec<access_p> *access_vec = base_access_vec->get (base);
> +  if (!access_vec)
> +    return NULL;
> +
> +  /* TODO: support reverse. */
> +  if (reverse || size <= 0 || offset + size > tree_to_shwi (DECL_SIZE 
> (base)))
> +    {
> +      base_access_vec->remove (base);
> +      return NULL;
> +    }
> +
> +  struct access *access = XNEWVEC (struct access, 1);
> +
> +  memset (access, 0, sizeof (struct access));
> +  access->offset = offset;
> +  access->size = size;
> +  access->expr = expr;
> +  access->write = write;
> +  access->reverse = reverse;
> +
> +  access_vec->safe_push (access);
> +
> +  return access;
> +}
> +
> +/* Function protect_mem_access_in_stmt removes the SRA candidates if
> +   there is addr-taken on the candidate in the STMT.  */
> +
> +void
> +expand_sra::analyze_phi (gphi *stmt)
> +{
> +  if (base_access_vec && !base_access_vec->is_empty ())
> +    walk_stmt_load_store_addr_ops (stmt, this, NULL, NULL, visit_base);
> +}
> +
> +void
> +expand_sra::analyze_assign (gassign *stmt)
> +{
> +  if (!base_access_vec || base_access_vec->is_empty ())
> +    return;
> +
> +  if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt))
> +    {
> +      tree rhs = gimple_assign_rhs1 (stmt);
> +      tree lhs = gimple_assign_lhs (stmt);
> +      bool res_r = build_access (rhs, false);
> +      bool res_l = build_access (lhs, true);
> +
> +      if (res_l || res_r)
> +     return;
> +    }
> +
> +  protect_mem_access_in_stmt (stmt);
> +}
> +
> +/* Callback of walk_stmt_load_store_addr_ops, used to remove
> +   unscalarizable accesses.  Called by protect_mem_access_in_stmt.  */
> +
> +bool
> +expand_sra::visit_base (gimple *, tree op, tree, void *data)
> +{
> +  op = get_base_address (op);
> +  if (op && DECL_P (op))
> +    {
> +      expand_sra *p = (expand_sra *) data;
> +      p->base_access_vec->remove (op);
> +    }
> +  return false;
> +}
> +
> +/* Function protect_mem_access_in_stmt removes the SRA candidates if
> +   there is store/load/addr-taken on the candidate in the STMT.
> +
> +   For some statements, which SRA does not care about, if there are
> +   possible memory operation on the SRA candidates, it would be risky
> +   to scalarize it.  */
> +
> +void
> +expand_sra::protect_mem_access_in_stmt (gimple *stmt)
> +{
> +  if (base_access_vec && !base_access_vec->is_empty ())
> +    walk_stmt_load_store_addr_ops (stmt, this, visit_base, visit_base,
> +                                visit_base);
> +}
> +
> +expand_sra::expand_sra () : base_access_vec (NULL)
> +{
> +  if (optimize <= 0)
> +    return;
> +
> +  base_access_vec = new hash_map<tree, auto_vec<access_p> >;
> +  collect_sra_candidates ();
> +}
> +
> +expand_sra::~expand_sra ()
> +{
> +  if (optimize <= 0)
> +    return;
> +
> +  delete base_access_vec;
> +}
> +
> +bool
> +expand_sra::scalarizable_access (access_p acc, rtx regs, bool is_parm)
> +{
> +  /* Now only support reading from parms
> +     or writing to returns.  */
> +  if (is_parm && acc->write)
> +    return false;
> +  if (!is_parm && !acc->write)
> +    return false;
> +
> +  /* Compute the position of the access in the parallel regs.  */
> +  int start_index = -1;
> +  int end_index = -1;
> +  HOST_WIDE_INT left_bits = 0;
> +  HOST_WIDE_INT right_bits = 0;
> +  query_position_in_parallel (acc->offset, acc->size, regs, start_index,
> +                           end_index, left_bits, right_bits);
> +
> +  /* Invalid access possition: padding or outof bound.  */
> +  if (start_index < 0 || end_index < 0)
> +    return false;
> +
> +  machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr));
> +  /* Need multi-registers in a parallel for the access.  */
> +  if (expr_mode == BLKmode || end_index > start_index)
> +    {
> +      if (left_bits || right_bits)
> +     return false;
> +      if (expr_mode == BLKmode)
> +     return true;
> +
> +      /* For large modes, only support TI/VECTOR in mult-registers. */
> +      if (known_gt (acc->size, GET_MODE_BITSIZE (word_mode)))
> +     return expr_mode == TImode || VECTOR_MODE_P (expr_mode);
> +      return true;
> +    }
> +
> +  gcc_assert (end_index == start_index);
> +
> +  /* Just need one reg for the access.  */
> +  if (left_bits == 0 && right_bits == 0)
> +    return true;
> +
> +  scalar_int_mode imode;
> +  /* Need to extract bits from the reg for the access.  */
> +  return !acc->write && int_mode_for_mode (expr_mode).exists (&imode);
> +}
> +
> +/* Now, the base (parm/return) is scalarizable, only if all
> +   accesses of the BASE are scalariable.
> +
> +   This function need to be updated, to support more complicate
> +   cases, like:
> +   - Some access are scalarizable, but some are not.
> +   - Access is writing to a parameter.
> +   - Writing accesses are overlap with multi-accesses.   */
> +
> +bool
> +expand_sra::scalarizable_accesses (tree base, rtx regs)
> +{
> +  if (!base_access_vec)
> +    return false;
> +  vec<access_p> *access_vec = base_access_vec->get (base);
> +  if (!access_vec)
> +    return false;
> +  if (access_vec->is_empty ())
> +    return false;
> +
> +  bool is_parm = TREE_CODE (base) == PARM_DECL;
> +  int n = access_vec->length ();
> +  int cur_access_index = 0;
> +  for (; cur_access_index < n; cur_access_index++)
> +    if (!scalarizable_access ((*access_vec)[cur_access_index], regs, 
> is_parm))
> +      break;
> +
> +  /* It is ok if all access are scalarizable.  */
> +  if (cur_access_index == n)
> +    return true;
> +
> +  base_access_vec->remove (base);
> +  return false;
> +}
> +
> +static expand_sra *current_sra = NULL;
> +
> +/* Check if the PARAM (or return) is scalarizable.
> +
> +   This interface is used in expand_function_start
> +   to check sra possiblity for parmeters. */
> +
> +bool
> +scalarizable_aggregate (tree parm, rtx regs)
> +{
> +  if (!current_sra)
> +    return false;
> +  return current_sra->scalarizable_accesses (parm, regs);
> +}
> +
> +/* Check if interesting returns, and if they are scalarizable,
> +   set DECL_RTL as scalar registers.
> +
> +   This interface is used in expand_function_start
> +   when outgoing registers are determinded for DECL_RESULT.  */
> +
> +void
> +set_scalar_rtx_for_returns ()
> +{
> +  rtx res = DECL_RTL (DECL_RESULT (current_function_decl));
> +  edge_iterator ei;
> +  edge e;
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +    if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
> +      {
> +     tree val = gimple_return_retval (r);
> +     if (val && VAR_P (val) && scalarizable_aggregate (val, res))
> +       SET_DECL_RTL (val, res);
> +      }
> +}
> +
>  /* Return an expression tree corresponding to the RHS of GIMPLE
>     statement STMT.  */
>  
> @@ -3707,7 +4045,7 @@ expand_value_return (rtx val)
>  
>    tree decl = DECL_RESULT (current_function_decl);
>    rtx return_reg = DECL_RTL (decl);
> -  if (return_reg != val)
> +  if (!rtx_equal_p (return_reg, val))
>      {
>        tree funtype = TREE_TYPE (current_function_decl);
>        tree type = TREE_TYPE (decl);
> @@ -4423,6 +4761,12 @@ expand_debug_expr (tree exp)
>    addr_space_t as;
>    scalar_int_mode op0_mode, op1_mode, addr_mode;
>  
> +  /* TODO: Enable to debug expand-sra optimized parm/returns.  */
> +  tree base = get_base_address (exp);
> +  if ((TREE_CODE (base) == PARM_DECL || (VAR_P (base) && DECL_RTL_SET_P 
> (base)))
> +      && GET_CODE (DECL_RTL (base)) == PARALLEL)
> +    return NULL_RTX;
> +
>    switch (TREE_CODE_CLASS (TREE_CODE (exp)))
>      {
>      case tcc_expression:
> @@ -6628,6 +6972,10 @@ pass_expand::execute (function *fun)
>    auto_bitmap forced_stack_vars;
>    discover_nonconstant_array_refs (forced_stack_vars);
>  
> +  /* Enable light-expander-sra.  */
> +  current_sra = new expand_sra;
> +  scan_function (cfun, *current_sra);
> +
>    /* Make sure all values used by the optimization passes have sane
>       defaults.  */
>    reg_renumber = 0;
> @@ -7056,6 +7404,8 @@ pass_expand::execute (function *fun)
>        loop_optimizer_finalize ();
>      }
>  
> +  delete current_sra;
> +  current_sra = NULL;
>    timevar_pop (TV_POST_EXPAND);
>  
>    return 0;
> diff --git a/gcc/cfgexpand.h b/gcc/cfgexpand.h
> index 0e551f6cfd3..3415c217708 100644
> --- a/gcc/cfgexpand.h
> +++ b/gcc/cfgexpand.h
> @@ -24,5 +24,7 @@ extern tree gimple_assign_rhs_to_tree (gimple *);
>  extern HOST_WIDE_INT estimated_stack_frame_size (struct cgraph_node *);
>  extern void set_parm_rtl (tree, rtx);
>  
> +extern bool scalarizable_aggregate (tree, rtx);
> +extern void set_scalar_rtx_for_returns ();
>  
>  #endif /* GCC_CFGEXPAND_H */
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 763bd82c59f..5ba26e0ef52 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -5618,7 +5618,10 @@ expand_assignment (tree to, tree from, bool 
> nontemporal)
>       Assignment of an array element at a constant index, and assignment of
>       an array element in an unaligned packed structure field, has the same
>       problem.  Same for (partially) storing into a non-memory object.  */
> -  if (handled_component_p (to)
> +  if ((handled_component_p (to)
> +       && !(VAR_P (get_base_address (to))
> +         && DECL_RTL_SET_P (get_base_address (to))
> +         && GET_CODE (DECL_RTL (get_base_address (to))) == PARALLEL))
>        || (TREE_CODE (to) == MEM_REF
>         && (REF_REVERSE_STORAGE_ORDER (to)
>             || mem_ref_refers_to_non_mem_p (to)))
> @@ -8909,6 +8912,19 @@ expand_constructor (tree exp, rtx target, enum 
> expand_modifier modifier,
>        && ! mostly_zeros_p (exp))
>      return NULL_RTX;
>  
> +  if (target && GET_CODE (target) == PARALLEL && all_zeros_p (exp))
> +    {
> +      int length = XVECLEN (target, 0);
> +      int start = XEXP (XVECEXP (target, 0, 0), 0) ? 0 : 1;
> +      for (int i = start; i < length; i++)
> +     {
> +       rtx dst = XEXP (XVECEXP (target, 0, i), 0);
> +       rtx zero = CONST0_RTX (GET_MODE (dst));
> +       emit_move_insn (dst, zero);
> +     }
> +      return target;
> +    }
> +
>    /* Handle calls that pass values in multiple non-contiguous
>       locations.  The Irix 6 ABI has examples of this.  */
>    if (target == 0 || ! safe_from_p (target, exp, 1)
> @@ -10621,6 +10637,157 @@ stmt_is_replaceable_p (gimple *stmt)
>    return false;
>  }
>  
> +/* In the parallel rtx register series REGS, compute the position for given
> +   {BITPOS, BITSIZE}.
> +   START_INDEX, END_INDEX, LEFT_BITS and RIGHT_BITS are computed outputs.  */
> +
> +void
> +query_position_in_parallel (HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize,
> +                         rtx regs, int &start_index, int &end_index,
> +                         HOST_WIDE_INT &left_bits, HOST_WIDE_INT &right_bits)
> +{
> +  int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1;
> +  for (; cur_index < XVECLEN (regs, 0); cur_index++)
> +    {
> +      rtx slot = XVECEXP (regs, 0, cur_index);
> +      HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT;
> +      machine_mode mode = GET_MODE (XEXP (slot, 0));
> +      HOST_WIDE_INT size = GET_MODE_BITSIZE (mode).to_constant ();
> +      if (off <= bitpos && off + size > bitpos)
> +     {
> +       start_index = cur_index;
> +       left_bits = bitpos - off;
> +     }
> +      if (off + size >= bitpos + bitsize)
> +     {
> +       end_index = cur_index;
> +       right_bits = off + size - (bitpos + bitsize);
> +       break;
> +     }
> +    }
> +}
> +
> +static rtx
> +extract_sub_member (rtx regs, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize,
> +                 tree expr)
> +{
> +  int start_index = -1;
> +  int end_index = -1;
> +  HOST_WIDE_INT left_bits = 0;
> +  HOST_WIDE_INT right_bits = 0;
> +  query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index,
> +                           left_bits, right_bits);
> +
> +  machine_mode expr_mode = TYPE_MODE (TREE_TYPE (expr));
> +  if (end_index > start_index || expr_mode == BLKmode)
> +    {
> +      /* TImode in multi-registers.  */
> +      if (expr_mode == TImode)
> +     {
> +       rtx res = gen_reg_rtx (expr_mode);
> +       HOST_WIDE_INT start;
> +       start = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1));
> +       for (int index = start_index; index <= end_index; index++)
> +         {
> +           rtx reg = XEXP (XVECEXP (regs, 0, index), 0);
> +           machine_mode mode = GET_MODE (reg);
> +           HOST_WIDE_INT off;
> +           off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)) - start;
> +           rtx sub = simplify_gen_subreg (mode, res, expr_mode, off);
> +           emit_move_insn (sub, reg);
> +         }
> +       return res;
> +     }
> +
> +      /* Vector in multi-registers.  */
> +      if (VECTOR_MODE_P (expr_mode))
> +     {
> +       rtvec vector = rtvec_alloc (end_index - start_index + 1);
> +       machine_mode emode;
> +       emode = GET_MODE (XEXP (XVECEXP (regs, 0, start_index), 0));
> +       for (int index = start_index; index <= end_index; index++)
> +         {
> +           rtx reg = XEXP (XVECEXP (regs, 0, index), 0);
> +           gcc_assert (emode == GET_MODE (reg));
> +           RTVEC_ELT (vector, index - start_index) = reg;
> +         }
> +       scalar_int_mode imode;
> +       machine_mode vmode;
> +       int nunits = end_index - start_index + 1;
> +       if (!(int_mode_for_mode (emode).exists (&imode)
> +             && mode_for_vector (imode, nunits).exists (&vmode)))
> +         gcc_unreachable ();
> +
> +       insn_code icode;
> +       icode = convert_optab_handler (vec_init_optab, vmode, imode);
> +       rtx res = gen_reg_rtx (vmode);
> +       emit_insn (GEN_FCN (icode) (res, gen_rtx_PARALLEL (vmode, vector)));
> +       if (expr_mode == vmode)
> +         return res;
> +       return simplify_gen_subreg (expr_mode, res, vmode, 0);
> +     }
> +
> +      /* Need multi-registers in a parallel for the access.  */
> +      int num_words = end_index - start_index + 1;
> +      rtx *tmps = XALLOCAVEC (rtx, num_words);
> +
> +      int pos = 0;
> +      HOST_WIDE_INT start;
> +      start = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1));
> +      /* Extract whole registers.  */
> +      for (; pos < num_words; pos++)
> +     {
> +       int index = start_index + pos;
> +       rtx reg = XEXP (XVECEXP (regs, 0, index), 0);
> +       machine_mode mode = GET_MODE (reg);
> +       HOST_WIDE_INT off;
> +       off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)) - start;
> +       tmps[pos] = gen_rtx_EXPR_LIST (mode, reg, GEN_INT (off));
> +     }
> +
> +      rtx res = gen_rtx_PARALLEL (expr_mode, gen_rtvec_v (pos, tmps));
> +      return res;
> +    }
> +
> +  gcc_assert (end_index == start_index);
> +
> +  /* Just need one reg for the access.  */
> +  if (left_bits == 0 && right_bits == 0)
> +    {
> +      rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0);
> +      if (GET_MODE (reg) != expr_mode)
> +     reg = gen_lowpart (expr_mode, reg);
> +      return reg;
> +    }
> +
> +  /* Need to extract bitfield part reg for the access.
> +     left_bits != 0 or right_bits != 0 */
> +  rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0);
> +  bool sgn = TYPE_UNSIGNED (TREE_TYPE (expr));
> +  scalar_int_mode imode;
> +  if (!int_mode_for_mode (expr_mode).exists (&imode))
> +    {
> +      gcc_assert (false);
> +      return NULL_RTX;
> +    }
> +
> +  machine_mode mode = GET_MODE (reg);
> +  bool reverse = false;
> +  rtx bfld = extract_bit_field (reg, bitsize, left_bits, sgn, NULL_RTX, mode,
> +                             imode, reverse, NULL);
> +
> +  if (GET_MODE (bfld) != imode)
> +    bfld = gen_lowpart (imode, bfld);
> +
> +  if (expr_mode == imode)
> +    return bfld;
> +
> +  /* expr_mode != imode, e.g. SF != SI.  */
> +  rtx result = gen_reg_rtx (imode);
> +  emit_move_insn (result, bfld);
> +  return gen_lowpart (expr_mode, result);
> +}
> +
>  rtx
>  expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
>                   enum expand_modifier modifier, rtx *alt_rtl,
> @@ -11498,6 +11665,16 @@ expand_expr_real_1 (tree exp, rtx target, 
> machine_mode tmode,
>         = expand_expr_real (tem, tem_target, VOIDmode, tem_modifier, NULL,
>                             true);
>  
> +     /* It is scalarizable access on param which is passed by registers.  */
> +     if (GET_CODE (op0) == PARALLEL
> +         && (TREE_CODE (tem) == PARM_DECL || VAR_P (tem)))
> +       {
> +         HOST_WIDE_INT pos, size;
> +         size = bitsize.to_constant ();
> +         pos = bitpos.to_constant ();
> +         return extract_sub_member (op0, pos, size, exp);
> +       }
> +
>       /* If the field has a mode, we want to access it in the
>          field's mode, not the computed mode.
>          If a MEM has VOIDmode (external with incomplete type),
> diff --git a/gcc/expr.h b/gcc/expr.h
> index 2a172867fdb..8a9332aaad6 100644
> --- a/gcc/expr.h
> +++ b/gcc/expr.h
> @@ -362,5 +362,8 @@ extern rtx expr_size (tree);
>  
>  extern bool mem_ref_refers_to_non_mem_p (tree);
>  extern bool non_mem_decl_p (tree);
> +extern void query_position_in_parallel (HOST_WIDE_INT, HOST_WIDE_INT, rtx,
> +                                     int &, int &, HOST_WIDE_INT &,
> +                                     HOST_WIDE_INT &);
>  
>  #endif /* GCC_EXPR_H */
> diff --git a/gcc/function.cc b/gcc/function.cc
> index afb0b33da9e..518250b2728 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -3107,8 +3107,29 @@ assign_parm_setup_block (struct assign_parm_data_all 
> *all,
>         emit_move_insn (mem, entry_parm);
>       }
>        else
> -     move_block_from_reg (REGNO (entry_parm), mem,
> -                          size_stored / UNITS_PER_WORD);
> +     {
> +       int regno = REGNO (entry_parm);
> +       int nregs = size_stored / UNITS_PER_WORD;
> +       rtx *tmps = XALLOCAVEC (rtx, nregs);
> +       machine_mode mode = word_mode;
> +       HOST_WIDE_INT word_size = GET_MODE_SIZE (mode).to_constant ();
> +       for (int i = 0; i < nregs; i++)
> +         {
> +           rtx reg = gen_rtx_REG (mode, regno + i);
> +           rtx off = GEN_INT (word_size * i);
> +           tmps[i] = gen_rtx_EXPR_LIST (VOIDmode, reg, off);
> +         }
> +
> +       rtx regs = gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps));
> +       if (scalarizable_aggregate (parm, regs))
> +         {
> +           rtx pseudos = gen_group_rtx (regs);
> +           emit_group_move (pseudos, regs);
> +           stack_parm = pseudos;
> +         }
> +       else
> +         move_block_from_reg (regno, mem, nregs);
> +     }
>      }
>    else if (data->stack_parm == 0 && !TYPE_EMPTY_P (data->arg.type))
>      {
> @@ -3710,7 +3731,15 @@ assign_parms (tree fndecl)
>  
>        assign_parm_adjust_stack_rtl (&data);
>  
> -      if (assign_parm_setup_block_p (&data))
> +      rtx incoming = DECL_INCOMING_RTL (parm);
> +      if (GET_CODE (incoming) == PARALLEL
> +       && scalarizable_aggregate (parm, incoming))
> +     {
> +       rtx pseudos = gen_group_rtx (incoming);
> +       emit_group_move (pseudos, incoming);
> +       set_parm_rtl (parm, pseudos);
> +     }
> +      else if (assign_parm_setup_block_p (&data))
>       assign_parm_setup_block (&all, parm, &data);
>        else if (data.arg.pass_by_reference || use_register_for_decl (parm))
>       assign_parm_setup_reg (&all, parm, &data);
> @@ -5128,6 +5157,7 @@ expand_function_start (tree subr)
>           {
>             gcc_assert (GET_CODE (hard_reg) == PARALLEL);
>             set_parm_rtl (res, gen_group_rtx (hard_reg));
> +           set_scalar_rtx_for_returns ();
>           }
>       }
>  
> diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
> index f20266c4622..df3071ccf6e 100644
> --- a/gcc/tree-sra.h
> +++ b/gcc/tree-sra.h
> @@ -19,6 +19,82 @@ You should have received a copy of the GNU General Public 
> License
>  along with GCC; see the file COPYING3.  If not see
>  <http://www.gnu.org/licenses/>.  */
>  
> +struct sra_base_access
> +{
> +  /* Values returned by get_ref_base_and_extent, indicates the
> +     OFFSET, SIZE and BASE of the access.  */
> +  HOST_WIDE_INT offset;
> +  HOST_WIDE_INT size;
> +
> +  /* The context expression of this access.  */
> +  tree expr;
> +
> +  /* Indicates this is a write access.  */
> +  bool write : 1;
> +
> +  /* Indicates if this access is made in reverse storage order.  */
> +  bool reverse : 1;
> +};
> +
> +/* Default template for sra_scan_function.  */
> +
> +struct sra_default_analyzer
> +{
> +  /* Template analyze functions.  */
> +  void analyze_phi (gphi *){};
> +  void pre_analyze_stmt (gimple *){};
> +  void analyze_return (greturn *){};
> +  void analyze_assign (gassign *){};
> +  void analyze_call (gcall *){};
> +  void analyze_asm (gasm *){};
> +  void analyze_default_stmt (gimple *){};
> +};
> +
> +/* Scan function and look for interesting expressions.  */
> +
> +template <typename analyzer>
> +void
> +scan_function (struct function *fun, analyzer &a)
> +{
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, fun)
> +    {
> +      for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
> +        gsi_next (&gsi))
> +     a.analyze_phi (gsi.phi ());
> +
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
> +        gsi_next (&gsi))
> +     {
> +       gimple *stmt = gsi_stmt (gsi);
> +       a.pre_analyze_stmt (stmt);
> +
> +       switch (gimple_code (stmt))
> +         {
> +         case GIMPLE_RETURN:
> +           a.analyze_return (as_a<greturn *> (stmt));
> +           break;
> +
> +         case GIMPLE_ASSIGN:
> +           a.analyze_assign (as_a<gassign *> (stmt));
> +           break;
> +
> +         case GIMPLE_CALL:
> +           a.analyze_call (as_a<gcall *> (stmt));
> +           break;
> +
> +         case GIMPLE_ASM:
> +           a.analyze_asm (as_a<gasm *> (stmt));
> +           break;
> +
> +         default:
> +           a.analyze_default_stmt (stmt);
> +           break;
> +         }
> +     }
> +    }
> +}
> +
>  bool type_internals_preclude_sra_p (tree type, const char **msg);
>  
>  /* Return true iff TYPE is stdarg va_list type (which early SRA and IPA-SRA
> diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
> index d8dafa5481a..7fc801f2612 100644
> --- a/gcc/var-tracking.cc
> +++ b/gcc/var-tracking.cc
> @@ -5352,7 +5352,8 @@ track_loc_p (rtx loc, tree expr, poly_int64 offset, 
> bool store_reg_p,
>       because the real and imaginary parts are represented as separate
>       pseudo registers, even if the whole complex value fits into one
>       hard register.  */
> -  if ((paradoxical_subreg_p (mode, DECL_MODE (expr))
> +  if (((DECL_MODE (expr) != BLKmode
> +     && paradoxical_subreg_p (mode, DECL_MODE (expr)))
>         || (store_reg_p
>          && !COMPLEX_MODE_P (DECL_MODE (expr))
>          && hard_regno_nregs (REGNO (loc), DECL_MODE (expr)) == 1))
> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C 
> b/gcc/testsuite/g++.target/powerpc/pr102024.C
> index 769585052b5..c8995cae707 100644
> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
> @@ -5,7 +5,7 @@
>  // Test that a zero-width bit field in an otherwise homogeneous aggregate
>  // generates a psabi warning and passes arguments in GPRs.
>  
> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>  
>  struct a_thing
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/pr20020-2.c 
> b/gcc/testsuite/gcc.target/i386/pr20020-2.c
> index fa8cb2528c5..723f1826630 100644
> --- a/gcc/testsuite/gcc.target/i386/pr20020-2.c
> +++ b/gcc/testsuite/gcc.target/i386/pr20020-2.c
> @@ -15,10 +15,15 @@ struct shared_ptr_struct
>  };
>  typedef struct shared_ptr_struct sptr_t;
>  
> +void foo (sptr_t *);
> +
>  void
>  copy_sptr (sptr_t *dest, sptr_t src)
>  {
>    *dest = src;
> +
> +  /* Prevent 'src' to be scalarized as registers.  */
> +  foo (&src);
>  }
>  
>  /* { dg-final { scan-rtl-dump "\\\(set \\\(reg:TI \[0-9\]*" "expand" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c 
> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> new file mode 100644
> index 00000000000..293bf93fb9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -save-temps" } */
> +
> +typedef struct DF {double a[4]; short s1; short s2; short s3; short s4; } DF;
> +typedef struct SF {float a[4]; int i1; int i2; } SF;
> +
> +/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 3 {target { 
> has_arch_ppc64 && has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-not {\mlwz\M} {target { has_arch_ppc64 && 
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-not {\mlhz\M} {target { has_arch_ppc64 && 
> has_arch_pwr8 } } } } */
> +short  __attribute__ ((noipa)) foo_hi (DF a, int flag){if (flag == 2)return 
> a.s2+a.s3;return 0;}
> +int  __attribute__ ((noipa)) foo_si (SF a, int flag){if (flag == 2)return 
> a.i2+a.i1;return 0;}
> +double __attribute__ ((noipa)) foo_df (DF arg, int flag){if (flag == 
> 2)return arg.a[3];else return 0.0;}
> +float  __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag == 
> 2)return arg.a[2]; return 0;}
> +float  __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag == 
> 2)return arg.a[1];return 0;}
> +
> +DF gdf = {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4};
> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1, 2};
> +
> +int main()
> +{
> +  if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 
> 4.0
> +     && foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0))
> +    __builtin_abort ();
> +  if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0
> +     && foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0))
> +    __builtin_abort ();
> +  return 0;
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
> new file mode 100644
> index 00000000000..4e1f87f7939
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
> @@ -0,0 +1,6 @@
> +/* PR target/65421 */
> +/* { dg-options "-O2" } */
> +
> +typedef struct LARGE {double a[4]; int arr[32];} LARGE;
> +LARGE foo (LARGE a){return a;}
> +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
> new file mode 100644
> index 00000000000..8a8e1a0e996
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
> @@ -0,0 +1,32 @@
> +/* PR target/65421 */
> +/* { dg-options "-O2" } */
> +/* { dg-require-effective-target powerpc_elfv2 } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> +
> +typedef struct FLOATS
> +{
> +  double a[3];
> +} FLOATS;
> +
> +/* 3 lfd after returns also optimized */
> +/* FLOATS ret_arg_pt (FLOATS *a){return *a;} */
> +
> +/* 3 stfd */
> +void st_arg (FLOATS a, FLOATS *p) {*p = a;}
> +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
> +
> +/* blr */
> +FLOATS ret_arg (FLOATS a) {return a;}
> +
> +typedef struct MIX
> +{
> +  double a[2];
> +  long l;
> +} MIX;
> +
> +/* std 3 param regs to return slot */
> +MIX ret_arg1 (MIX a) {return a;}
> +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */
> +
> +/* count insns */
> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */

Ping: [PATCH V2] introduce light expander sra

Reply via email to