Hi, As mentioned in the previous version patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html The suboptimal code is generated for "assigning from parameter" or "assigning to return value". This patch enhances the assignment from parameters like the below cases: /////case1.c typedef struct SA {double a[3];long l; } A; A ret_arg (A a) {return a;} void st_arg (A a, A *p) {*p = a;}
////case2.c typedef struct SA {double a[3];} A; A ret_arg (A a) {return a;} void st_arg (A a, A *p) {*p = a;} For this patch, bootstrap and regtest pass on ppc64{,le} and x86_64. * Besides asking for help reviewing this patch, I would like to consult comments about enhancing for "assigning to returns". On some targets(ppc64), for below case: ////case3.c typedef struct SA {double a[3]; long l; } A; A ret_arg_pt (A *a) {return *a;} The optimized GIMPLE code looks like: <retval> = *a_2(D); return <retval>; Here, <retval>(aka. RESULT_DECL) is MEM, and "aggregate_value_p" returns true for <retval>. * While for below case, the generated code is still suboptimal. ////case4.c typedef struct SA {double a[3];} A; A ret_arg_pt (A *a) {return *a;} The optimized GIMPLE code looks like: D.3951 = *a_2(D); return D.3951; The "return/assign" stmts are using D.3951(VAR_DECL) instead "<retval>(RESULT_DECL)". The mode of D.3951/<retval> is BLK. The RTL of D.3951 is MEM, and RTL of <retval> is PARALLEL. For PARALLEL, aggregate_value_p returns false. In function expand_assignment, there is code: if (TREE_CODE (to) == RESULT_DECL && (REG_P (to_rtx) || GET_CODE (to_rtx) == PARALLEL)) This code can handle "<retval>", but can not handle "D.3951". I'm thinking of one way to handle this issue is to update the GIMPLE sequence as: "<retval> = *a_2(D); return <retval>;" Or, collecting VARs which are used by return stmts; and for assignments to those VARs, using sub scalar mode for the block move. Thanks for any comments and suggestions! BR, Jeff (Jiufu) --- gcc/expr.cc | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/gcc/expr.cc b/gcc/expr.cc index d9407432ea5..420f9cf3662 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -6045,6 +6045,46 @@ expand_assignment (tree to, tree from, bool nontemporal) return; } + if (TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from) + && TYPE_MODE (TREE_TYPE (from)) == BLKmode + && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL + || REG_P (DECL_INCOMING_RTL (from)))) + { + rtx parm = DECL_INCOMING_RTL (from); + + push_temp_slots (); + machine_mode mode; + mode = GET_CODE (parm) == PARALLEL + ? GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0)) + : word_mode; + int mode_size = GET_MODE_SIZE (mode).to_constant (); + int size = INTVAL (expr_size (from)); + + /* If/How the parameter using submode, it dependes on the size and + position of the parameter. Here using heurisitic number. */ + int hurstc_num = 8; + if (size < mode_size || (size % mode_size) != 0 + || size > (mode_size * hurstc_num)) + result = store_expr (from, to_rtx, 0, nontemporal, false); + else + { + rtx from_rtx + = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL); + for (int i = 0; i < size / mode_size; i++) + { + rtx temp = gen_reg_rtx (mode); + rtx src = adjust_address (from_rtx, mode, mode_size * i); + rtx dest = adjust_address (to_rtx, mode, mode_size * i); + emit_move_insn (temp, src); + emit_move_insn (dest, temp); + } + result = to_rtx; + } + preserve_temp_slots (result); + pop_temp_slots (); + return; + } + /* Compute FROM and store the value in the rtx we got. */ push_temp_slots (); -- 2.17.1