Hi,

Richard Biener <rguent...@suse.de> writes:

> On Thu, 22 Dec 2022, guojiufu wrote:
>
>> Hi,
>> 
>> On 2022-12-21 15:30, Richard Biener wrote:
>> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
>> > 
>> >> Hi,
>> >> 
>> >> This patch is fixing an issue about parameter accessing if the
>> >> parameter is struct type and passed through integer registers, and
>> >> there is floating member is accessed. Like below code:
>> >> 
>> >> typedef struct DF {double a[4]; long l; } DF;
>> >> double foo_df (DF arg){return arg.a[3];}
>> >> 
>> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
>> >> generated.  While instruction "mtvsrd 1, 6" would be enough for
>> >> this case.
>> > 
>> > So why do we end up spilling for PPC?
>> 
>> Good question! According to GCC source code (in function.cc/expr.cc),
>> it is common behavior: using "word_mode" to store the parameter to stack,
>> And using the field's mode (e.g. float mode) to load from the stack.
>> But with some tries, I fail to construct cases on many platforms.
>> So, I convert the fix to a target hook and implemented the rs6000 part
>> first.
>> 
>> > 
>> > struct X { int i; float f; };
>> > 
>> > float foo (struct X x)
>> > {
>> >   return x.f;
>> > }
>> > 
>> > does pass the structure in $RDI on x86_64 and we manage (with
>> > optimization, with -O0 we spill) to generate
>> > 
>> >         shrq    $32, %rdi
>> >         movd    %edi, %xmm0
>> > 
>> > and RTL expansion generates
>> > 
>> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
>> >         (reg:DI 5 di [ x ])) "t.c":4:1 -1
>> >      (nil))
>> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>> > (insn 6 3 7 2 (parallel [
>> >             (set (reg:DI 85)
>> >                 (ashiftrt:DI (reg/v:DI 83 [ x ])
>> >                     (const_int 32 [0x20])))
>> >             (clobber (reg:CC 17 flags))
>> >         ]) "t.c":5:11 -1
>> >      (nil))
>> > (insn 7 6 8 2 (set (reg:SI 86)
>> >         (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
>> >      (nil))
>> > 
>> > I would imagine that for the ppc case we only see the subreg here
>> > which should be even easier to optimize.
>> > 
>> > So how's this not fixable by providing proper patterns / subreg
>> > capabilities?  Looking a bit at the RTL we have the issue might
>> > be that nothing seems to handle CSE of
>> > 
>> 
>> This case is also related to 'parameter on struct', PR89310 is
>> just for this case. On trunk, it is fixed.
>> One difference: the parameter is in DImode, and passed via an
>> integer register for "{int i; float f;}".
>> But for "{double a[4]; long l;}", the parameter is in BLKmode,
>> and stored to stack during the argument setup.
>
> OK, so this would be another case for "heuristics" to use
> sth different than word_mode for storing, but of course
> the arguments are in integer registers and using different
> modes can for example prohibit store-multiple instruction use.
>
> As I said in the related thread an RTL expansion time "SRA"
> with the incoming argument assignment in mind could make
> more optimal decisions for these kind of special-cases.

Thanks a lot for your comments!

Yeap! Using SRA-like analysis during expansion for parameter
and returns (and may also some field accessing) would be a
generic improvement for this kind of issue (PR101926 collected
a lot of them).
While we may still need some work for various ABIs and different
targets, to analyze where the 'struct field' come from
(int/float/vector/.. registers, or stack) and how the struct
need to be handled (keep in pseudo or store in the stack).
This may indicate a mount of changes for param_setup code.

To reduce risk, I'm just draft straightforward patches for
special cases currently, Like:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
and this patch.

>
>> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
>> >                 (const_int 56 [0x38])) [2 arg+24 S8 A64])
>> >         (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
>> >      (expr_list:REG_DEAD (reg:DI 6 6)
>> >         (nil)))
>> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
>> > (note 10 7 15 2 NOTE_INSN_DELETED)
>> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
>> >         (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
>> >                 (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40
>> > 576 {*movdf_hardfloat64}
>> >      (nil))
>> > 
>> > Possibly because the store and load happen in a different mode?  Can
>> > you see why CSE doesn't handle this (producing a subreg)?  On
>> 
>> Yes, exactly! For "{double a[4]; long l;}", because the store and load
>> are using a different mode, and then CSE does not optimize it.  This
>> patch makes the store and load using the same mode (DImode), and then
>> leverage CSE to handle it.
>
> So can we instead fix CSE to consider replacing insn 15 above with
>
>  (insn 15 (set (reg/i:DF 33 1)
>                (subreg:DF (reg/f:DI 6 6)))
>

Thanks for your suggestion! I will check to see if able to draft
a quick fix.

BR,
Jeff (Jiufu)

> ?  One peculiarity is that this introduces a hardreg use in this
> special case.
>
> Richard.
>
>> > the GIMPLE side we'd happily do that (but we don't see the argument
>> > setup).
>> 
>> Thanks for your comments!
>> 
>> 
>> BR,
>> Jeff (Jiufu)
>> 
>> > 
>> > Thanks,
>> > Richard.
>> > 
>> >> This patch updates the behavior when loading floating members of a
>> >> parameter: if that floating member is stored via integer register,
>> >> then loading it as integer mode first, and converting it to floating
>> >> mode.
>> >> 
>> >> I also thought of a method: before storing the register to stack,
>> >> convert it to float mode first. While there are some cases that may
>> >> still prefer to keep an integer register store.
>> >> 
>> >> Bootstrap and regtest passes on ppc64{,le}.
>> >> I would ask for help to review for comments and if this patch is
>> >> acceptable for the trunk.
>> >> 
>> >> 
>> >> BR,
>> >> Jeff (Jiufu)
>> >> 
>> >>  PR target/108073
>> >> 
>> >> gcc/ChangeLog:
>> >> 
>> >>  * config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
>> >>  macro definition.
>> >>  (rs6000_loading_int_convert_to_float): New hook implement.
>> >>  * doc/tm.texi: Regenerated.
>> >>  * doc/tm.texi.in (loading_int_convert_to_float): New hook.
>> >>  * expr.cc (expand_expr_real_1): Updated to use the new hook.
>> >>  * target.def (loading_int_convert_to_float): New hook.
>> >> 
>> >> gcc/testsuite/ChangeLog:
>> >> 
>> >>  * g++.target/powerpc/pr102024.C: Update.
>> >>  * gcc.target/powerpc/pr108073.c: New test.
>> >> 
>> >> ---
>> >> gcc/config/rs6000/rs6000.cc                 | 70 +++++++++++++++++++++
>> >>  gcc/doc/tm.texi                             |  6 ++
>> >>  gcc/doc/tm.texi.in                          |  2 +
>> >>  gcc/expr.cc                                 | 15 +++++
>> >>  gcc/target.def                              | 11 ++++
>> >>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
>> >>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
>> >>  7 files changed, 129 insertions(+), 1 deletion(-)
>> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>> >> 
>> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> >> index b3a609f3aa3..af676eea276 100644
>> >> --- a/gcc/config/rs6000/rs6000.cc
>> >> +++ b/gcc/config/rs6000/rs6000.cc
>> >> @@ -1559,6 +1559,9 @@ static const struct attribute_spec 
>> >> rs6000_attribute_table[] =
>> >>  #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
>> >>  #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN 
>> >> invalid_arg_for_unprototyped_fn
>> >> 
>> >> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> >> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> >> rs6000_loading_int_convert_to_float
>> >> +
>> >>  #undef TARGET_MD_ASM_ADJUST
>> >>  #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
>> >> 
>> >> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree 
>> >> typelist, const_tree funcdecl, const
>> >>     : NULL;
>> >>  }
>> >> 
>> >> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
>> >> +static rtx
>> >> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx
>> >> base)
>> >> +{
>> >> +  rtx src_base = XEXP (source, 0);
>> >> +  poly_uint64 offset = MEM_OFFSET (source);
>> >> +
>> >> +  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
>> >> +    {
>> >> +      offset += INTVAL (XEXP (src_base, 1));
>> >> +      src_base = XEXP (src_base, 0);
>> >> +    }
>> >> +
>> >> +  if (!rtx_equal_p (XEXP (base, 0), src_base))
>> >> +    return NULL_RTX;
>> >> +
>> >> +  rtx temp_reg = gen_reg_rtx (word_mode);
>> >> +  rtx temp_mem = copy_rtx (source);
>> >> +  PUT_MODE (temp_mem, word_mode);
>> >> +
>> >> +  /* DI->DF */
>> >> +  if (word_mode == DImode && mode == DFmode)
>> >> +    {
>> >> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
>> >> + {
>> >> +   emit_move_insn (temp_reg, temp_mem);
>> >> +   rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode,
>> >> 0);
>> >> +   rtx target_reg = gen_reg_rtx (mode);
>> >> +   emit_move_insn (target_reg, float_subreg);
>> >> +   return target_reg;
>> >> + }
>> >> +      return NULL_RTX;
>> >> +    }
>> >> +
>> >> +  /* Sub DI#->SF */
>> >> +  if (word_mode == DImode && mode == SFmode)
>> >> +    {
>> >> +      poly_uint64 byte_off = 0;
>> >> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
>> >> + byte_off = 0;
>> >> +      else if (multiple_p (offset - GET_MODE_SIZE (mode),
>> >> +                    GET_MODE_SIZE (word_mode)))
>> >> + byte_off = GET_MODE_SIZE (mode);
>> >> +      else
>> >> + return NULL_RTX;
>> >> +
>> >> +      temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
>> >> +      emit_move_insn (temp_reg, temp_mem);
>> >> +
>> >> +      /* little endia only? */
>> >> +      poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode);
>> >> +      if (known_eq (byte_off, high_off))
>> >> + {
>> >> +   temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
>> >> +                            GET_MODE_PRECISION (SImode), temp_reg, 0);
>> >> + }
>> >> +      rtx subreg_si = gen_reg_rtx (SImode);
>> >> +      emit_move_insn (subreg_si,  gen_lowpart (SImode, temp_reg));
>> >> +      rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 
>> >> 0);
>> >> +      rtx target_reg = gen_reg_rtx (mode);
>> >> +      emit_move_insn (target_reg, float_subreg);
>> >> +      return target_reg;
>> >> +    }
>> >> +
>> >> +  return NULL_RTX;
>> >> +}
>> >> +
>> >>  /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
>> >>     setup by using __stack_chk_fail_local hidden function instead of
>> >>     calling __stack_chk_fail directly.  Otherwise it is better to call
>> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> >> index 8fe49c2ba3d..10f94af553d 100644
>> >> --- a/gcc/doc/tm.texi
>> >> +++ b/gcc/doc/tm.texi
>> >> @@ -11933,6 +11933,12 @@ or when the back end is in a 
>> >> partially-initialized state.
>> >>  outside of any function scope.
>> >>  @end deftypefn
>> >> 
>> >> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> >> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
>> >> +If the target is protifiable to load an integer in word_mode from
>> >> +@var{source} which is based on @var{base}, then convert to a floating
>> >> +point value in @var{mode}.
>> >> +@end deftypefn
>> >> +
>> >>  @defmac TARGET_OBJECT_SUFFIX
>> >>  Define this macro to be a C string representing the suffix for object
>> >>  files on your target machine.  If you do not define this macro, GCC 
>> >> will
>> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> >> index 62c49ac46de..1ca6a671d86 100644
>> >> --- a/gcc/doc/tm.texi.in
>> >> +++ b/gcc/doc/tm.texi.in
>> >> @@ -7756,6 +7756,8 @@ to by @var{ce_info}.
>> >> 
>> >>  @hook TARGET_SET_CURRENT_FUNCTION
>> >> 
>> >> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> >> +
>> >>  @defmac TARGET_OBJECT_SUFFIX
>> >>  Define this macro to be a C string representing the suffix for object
>> >>  files on your target machine.  If you do not define this macro, GCC 
>> >> will
>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> >> index d9407432ea5..466079220e7 100644
>> >> --- a/gcc/expr.cc
>> >> +++ b/gcc/expr.cc
>> >> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, 
>> >> machine_mode tmode,
>> >>       && modifier != EXPAND_WRITE)
>> >>     op0 = flip_storage_order (mode1, op0);
>> >> 
>> >> + /* Accessing float field of struct parameter which passed via integer
>> >> +    registers.  */
>> >> + if (targetm.loading_int_convert_to_float && mode == mode1
>> >> +     && GET_MODE_CLASS (mode) == MODE_FLOAT
>> >> +     && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
>> >> +     && REG_P (DECL_INCOMING_RTL (tem))
>> >> +     && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
>> >> +     && MEM_OFFSET_KNOWN_P (op0))
>> >> +   {
>> >> +     rtx res = targetm.loading_int_convert_to_float (mode, op0,
>> >> +                                                     DECL_RTL (tem));
>> >> +     if (res)
>> >> +       op0 = res;
>> >> +   }
>> >> +
>> >>   if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
>> >>       || modifier == EXPAND_CONST_ADDRESS
>> >>       || modifier == EXPAND_INITIALIZER)
>> >> diff --git a/gcc/target.def b/gcc/target.def
>> >> index 082a7c62f34..837ce902489 100644
>> >> --- a/gcc/target.def
>> >> +++ b/gcc/target.def
>> >> @@ -4491,6 +4491,17 @@ original and the returned modes should be 
>> >> @code{MODE_INT}.",
>> >>   (machine_mode mode),
>> >>   default_preferred_doloop_mode)
>> >> 
>> >> +
>> >> +/* Loading an integer value from memory first, then convert (bitcast)
>> >> to\n\n
>> >> +   floating point value, if target is able to support such behavior.  */
>> >> +DEFHOOK
>> >> +(loading_int_convert_to_float,
>> >> +"If the target is protifiable to load an integer in word_mode from\n\
>> >> +@var{source} which is based on @var{base}, then convert to a floating\n\
>> >> +point value in @var{mode}.",
>> >> + rtx, (machine_mode mode, rtx source, rtx base),
>> >> + NULL)
>> >> +
>> >>  /* Returns true for a legitimate combined insn.  */
>> >>  DEFHOOK
>> >>  (legitimate_combined_insn,
>> >> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C
>> >> b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> >> index 769585052b5..c8995cae707 100644
>> >> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
>> >> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> >> @@ -5,7 +5,7 @@
>> >> // Test that a zero-width bit field in an otherwise homogeneous aggregate
>> >>  // generates a psabi warning and passes arguments in GPRs.
>> >> 
>> >> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
>> >> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>> >> 
>> >>  struct a_thing
>> >>  {
>> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> >> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> >> new file mode 100644
>> >> index 00000000000..aa02de56405
>> >> --- /dev/null
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> >> @@ -0,0 +1,24 @@
>> >> +/* { dg-do run } */
>> >> +/* { dg-options "-O2 -save-temps" } */
>> >> +
>> >> +typedef struct DF {double a[4]; long l; } DF;
>> >> +typedef struct SF {float a[4];short l; } SF;
>> >> +
>> >> +/* Each of below function contains one mtvsrd.  */
>> >> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target {
>> >> has_arch_ppc64 && has_arch_pwr8 } } } } */
>> >> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
>> >> +float  __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
>> >> +float  __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
>> >> +
>> >> +double gd = 4.0;
>> >> +float gf1 = 3.0f, gf2 = 2.0f;
>> >> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
>> >> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
>> >> +
>> >> +int main()
>> >> +{
>> >> +  if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) ==
>> >> gf2))
>> >> +    __builtin_abort ();
>> >> +  return 0;
>> >> +}
>> >> +
>> >> 
>> 
>> 

Reply via email to