Hi,

On 2020/6/3 04:32, Segher Boessenkool wrote:
> Hi Xiong Hu,
> 
> On Tue, Jun 02, 2020 at 04:41:50AM -0500, Xionghu Luo wrote:
>> Double array in structure as function arguments or return value is accessed
>> by BLKmode, they are stored to stack and load from stack with redundant
>> conversion from DF->DI->DF.  This patch checks the homogeneous type and
>> use the actual element type to do block move to by pass the conversions.
> 
>> @@ -2733,6 +2734,7 @@ expand_block_move (rtx operands[], bool might_overlap)
>>     rtx loads[MAX_MOVE_REG];
>>     rtx stores[MAX_MOVE_REG];
>>     int num_reg = 0;
>> +  machine_mode elt_mode = DImode;
>>   
>>     /* If this is not a fixed size move, just call memcpy */
>>     if (! constp)
>> @@ -2750,6 +2752,17 @@ expand_block_move (rtx operands[], bool might_overlap)
>>     if (bytes > rs6000_block_move_inline_limit)
>>       return 0;
>>   
>> +  tree type = TREE_TYPE (MEM_EXPR (orig_dest));
> 
> Declare elt_mode here as well?
> 
>> +  if (TREE_CODE (type) == RECORD_TYPE
>> +      && rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type, 
>> NULL,
>> +                                            NULL))
>> +    {
>> +      tree field_type = TREE_TYPE (first_field (type));
>> +      if (field_type && TREE_CODE (field_type) == ARRAY_TYPE
>> +      && TREE_CODE (TREE_TYPE (field_type)) == REAL_TYPE)
>> +    elt_mode = TYPE_MODE (TREE_TYPE (field_type));
>> +    }
> 
> Homogeneous aggregates only exist in the ELFv2 ABI, while the problem
> here is the SP float things.  You also noticed (elsewhere) that if the
> struct contains (say) SI, SF, SI, SF, then this does not help.
> 
> Is there some better condition this could use, and maybe an expansion
> that works in more cases as well?
> 
> And, it would be lovely if generic code could expand to something better
> already (not expand to a block move at all, certainly not for something
> as tiny as this).

This pr65421 is array in structure, the assignment is just struct to struct
and won't be split by SROA to element assignment like struct contains <SF, SF>
or <SI, SF>.

pr65421.c.236t.optimized:
foo (const struct A a)
{
  struct A D.2909;

  <bb 2> [local count: 1073741824]:
  D.2909 = a;      // struct to struct.
  return D.2909;
}

pr69143.c.234t.optimized:
blah1 (struct foo1 a)
{
  struct foo1 D.2909;
  float _1;
  float _2;

  <bb 2> [local count: 1073741824]:
  _1 = a.y;
  _2 = a.x;
  D.2909.x = _1;    // element to element.
  D.2909.y = _2;    // element to element.
  return D.2909;
}

So the expander will choose difference path to expand them...

For pr65421, the arguments and return value are accessed by BLKmode after 
gimplify,
since there is no IPA pass, it is never changed from pass gimple to expand.

In expander, the type conversion only happens on expand_assignment of "D.2909 = 
a;"
(arguments assigned to local variable, stack to stack, generated by 
expand_block_move,
insn #13~#20 as followed), the expand_function_start(insn #2~#9) load each 
element type
to be DF already, DF to DI conversion in insn #13~#20 cause the later RTL 
passes fail to
do forward propagation in 246r.fwprop1.  So my patch tries to use the actual 
type for
array in structure here.  If rs6000_discover_homogeneous_aggregate is not 
allowed to be
used here, how about expose and call rs6000_aggregate_candidate directly?  Not 
clear why
"Homogeneous aggregates only exist in the ELFv2 ABI" since double array in 
structure is a
common usage?

pr65421.c.238r.expand:
    1: NOTE_INSN_DELETED
   11: NOTE_INSN_BASIC_BLOCK 2
    2: r121:DF=%1:DF
    3: r122:DF=%2:DF
    4: r123:DF=%3:DF
    5: r124:DF=%4:DF
    6: [r112:DI+0x20]=r121:DF
    7: [r112:DI+0x28]=r122:DF
    8: [r112:DI+0x30]=r123:DF
    9: [r112:DI+0x38]=r124:DF
   10: NOTE_INSN_FUNCTION_BEG
   13: r125:DI=[r112:DI+0x20]
   15: r126:DI=[r112:DI+0x28]
   17: r127:DI=[r112:DI+0x30]
   19: r128:DI=[r112:DI+0x38]
   14: [r112:DI]=r125:DI
   16: [r112:DI+0x8]=r126:DI
   18: [r112:DI+0x10]=r127:DI
   20: [r112:DI+0x18]=r128:DI
   21: r129:DF=[r112:DI]
   22: r130:DF=[r112:DI+0x8]
   23: r131:DF=[r112:DI+0x10]
   24: r132:DF=[r112:DI+0x18]
   25: r117:DF=r129:DF
   26: r118:DF=r130:DF
   27: r119:DF=r131:DF
   28: r120:DF=r132:DF
   32: %1:DF=r117:DF
   33: %2:DF=r118:DF
   34: %3:DF=r119:DF
   35: %4:DF=r120:DF
   36: use %1:DF
   37: use %2:DF
   38: use %3:DF
   39: use %4:DF

To bypass block move requires very generic code change, the BLK mode is 
determined very early
in gimple, remove BLKmode seems huge project in stor-layout.c\function.c\expr.c 
etc. and not sure
other targets like it, the ARM64 use OImode register to avoid BLKmode/stack 
operations, while
X86 expand to two TImode register assignment and pointer result return.

Or do you mean some workaround that don't call emit_block_move to fall in 
expand_block_move in
rs6000-string.c when expand_assignment of "D.2909 = a;" below?
rtx
store_expr (tree exp, rtx target, int call_param_p,
            bool nontemporal, bool reverse)
{
...
      else if (GET_CODE (temp) == PARALLEL)
        emit_group_store (target, temp, TREE_TYPE (exp),
                          int_size_in_bytes (TREE_TYPE (exp)));
      else if (GET_MODE (temp) == BLKmode)
        emit_block_move (target, temp, expr_size (exp),
                         (call_param_p
                          ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL))
...
}


Thanks,
Xionghu

> 
> 
> Segher
> 

Reply via email to