Hi, On 2020/6/3 04:32, Segher Boessenkool wrote: > Hi Xiong Hu, > > On Tue, Jun 02, 2020 at 04:41:50AM -0500, Xionghu Luo wrote: >> Double array in structure as function arguments or return value is accessed >> by BLKmode, they are stored to stack and load from stack with redundant >> conversion from DF->DI->DF. This patch checks the homogeneous type and >> use the actual element type to do block move to by pass the conversions. > >> @@ -2733,6 +2734,7 @@ expand_block_move (rtx operands[], bool might_overlap) >> rtx loads[MAX_MOVE_REG]; >> rtx stores[MAX_MOVE_REG]; >> int num_reg = 0; >> + machine_mode elt_mode = DImode; >> >> /* If this is not a fixed size move, just call memcpy */ >> if (! constp) >> @@ -2750,6 +2752,17 @@ expand_block_move (rtx operands[], bool might_overlap) >> if (bytes > rs6000_block_move_inline_limit) >> return 0; >> >> + tree type = TREE_TYPE (MEM_EXPR (orig_dest)); > > Declare elt_mode here as well? > >> + if (TREE_CODE (type) == RECORD_TYPE >> + && rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type, >> NULL, >> + NULL)) >> + { >> + tree field_type = TREE_TYPE (first_field (type)); >> + if (field_type && TREE_CODE (field_type) == ARRAY_TYPE >> + && TREE_CODE (TREE_TYPE (field_type)) == REAL_TYPE) >> + elt_mode = TYPE_MODE (TREE_TYPE (field_type)); >> + } > > Homogeneous aggregates only exist in the ELFv2 ABI, while the problem > here is the SP float things. You also noticed (elsewhere) that if the > struct contains (say) SI, SF, SI, SF, then this does not help. > > Is there some better condition this could use, and maybe an expansion > that works in more cases as well? > > And, it would be lovely if generic code could expand to something better > already (not expand to a block move at all, certainly not for something > as tiny as this).
This pr65421 is array in structure, the assignment is just struct to struct and won't be split by SROA to element assignment like struct contains <SF, SF> or <SI, SF>. pr65421.c.236t.optimized: foo (const struct A a) { struct A D.2909; <bb 2> [local count: 1073741824]: D.2909 = a; // struct to struct. return D.2909; } pr69143.c.234t.optimized: blah1 (struct foo1 a) { struct foo1 D.2909; float _1; float _2; <bb 2> [local count: 1073741824]: _1 = a.y; _2 = a.x; D.2909.x = _1; // element to element. D.2909.y = _2; // element to element. return D.2909; } So the expander will choose difference path to expand them... For pr65421, the arguments and return value are accessed by BLKmode after gimplify, since there is no IPA pass, it is never changed from pass gimple to expand. In expander, the type conversion only happens on expand_assignment of "D.2909 = a;" (arguments assigned to local variable, stack to stack, generated by expand_block_move, insn #13~#20 as followed), the expand_function_start(insn #2~#9) load each element type to be DF already, DF to DI conversion in insn #13~#20 cause the later RTL passes fail to do forward propagation in 246r.fwprop1. So my patch tries to use the actual type for array in structure here. If rs6000_discover_homogeneous_aggregate is not allowed to be used here, how about expose and call rs6000_aggregate_candidate directly? Not clear why "Homogeneous aggregates only exist in the ELFv2 ABI" since double array in structure is a common usage? pr65421.c.238r.expand: 1: NOTE_INSN_DELETED 11: NOTE_INSN_BASIC_BLOCK 2 2: r121:DF=%1:DF 3: r122:DF=%2:DF 4: r123:DF=%3:DF 5: r124:DF=%4:DF 6: [r112:DI+0x20]=r121:DF 7: [r112:DI+0x28]=r122:DF 8: [r112:DI+0x30]=r123:DF 9: [r112:DI+0x38]=r124:DF 10: NOTE_INSN_FUNCTION_BEG 13: r125:DI=[r112:DI+0x20] 15: r126:DI=[r112:DI+0x28] 17: r127:DI=[r112:DI+0x30] 19: r128:DI=[r112:DI+0x38] 14: [r112:DI]=r125:DI 16: [r112:DI+0x8]=r126:DI 18: [r112:DI+0x10]=r127:DI 20: [r112:DI+0x18]=r128:DI 21: r129:DF=[r112:DI] 22: r130:DF=[r112:DI+0x8] 23: r131:DF=[r112:DI+0x10] 24: r132:DF=[r112:DI+0x18] 25: r117:DF=r129:DF 26: r118:DF=r130:DF 27: r119:DF=r131:DF 28: r120:DF=r132:DF 32: %1:DF=r117:DF 33: %2:DF=r118:DF 34: %3:DF=r119:DF 35: %4:DF=r120:DF 36: use %1:DF 37: use %2:DF 38: use %3:DF 39: use %4:DF To bypass block move requires very generic code change, the BLK mode is determined very early in gimple, remove BLKmode seems huge project in stor-layout.c\function.c\expr.c etc. and not sure other targets like it, the ARM64 use OImode register to avoid BLKmode/stack operations, while X86 expand to two TImode register assignment and pointer result return. Or do you mean some workaround that don't call emit_block_move to fall in expand_block_move in rs6000-string.c when expand_assignment of "D.2909 = a;" below? rtx store_expr (tree exp, rtx target, int call_param_p, bool nontemporal, bool reverse) { ... else if (GET_CODE (temp) == PARALLEL) emit_group_store (target, temp, TREE_TYPE (exp), int_size_in_bytes (TREE_TYPE (exp))); else if (GET_MODE (temp) == BLKmode) emit_block_move (target, temp, expr_size (exp), (call_param_p ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)) ... } Thanks, Xionghu > > > Segher >