"Kewen.Lin" <li...@linux.ibm.com> writes:

> Hi Jeff,
>
> Thanks for the patch, some comments on nits are inline.
>
> on 2022/9/1 11:24, Jiufu Guo wrote:
>> Hi,
>> 
>> As mentioned in PR106550, since pli could support 34bits immediate, we could
>> use less instructions(3insn would be ok) to build 64bits constant with pli.
>> 
>> For example, for constant 0x020805006106003, we could generate it with:
>> asm code1:
>> pli 9,101736451 (0x6106003)
>> sldi 9,9,32
>> paddi 9,9, 2130000 (0x0208050)
>> 
>> or asm code2:
>> pli 10, 2130000
>> pli 9, 101736451
>> rldimi 9, 10, 32, 0
>> 
>> Testing with simple cases as below, run them a lot of times:
>> f1.c
>> long __attribute__ ((noinline)) foo (long *arg,long *,long*)
>> {
>>   *arg = 0x2351847027482577;
>> }
>> 5insns: base
>> pli+sldi+paddi: similar -0.08%
>> pli+pli+rldimi: faster +0.66%
>> 
>> f2.c
>> long __attribute__ ((noinline)) foo (long *arg, long *arg2, long *arg3)
>> {
>>   *arg = 0x2351847027482577;
>>   *arg2 = 0x3257845024384680;
>>   *arg3 = 0x1245abcef9240dec;
>> }
>> 5nisns: base
>> pli+sldi+paddi: faster +1.35%
>> pli+pli+rldimi: faster +5.49%
>> 
>> f2.c would be more meaningful.  Because 'sched passes' are effective for
>> f2.c, but 'scheds' do less thing for f1.c.
>> 
>> Compare with previous patch:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599525.html
>> This one updates code slightly and extracts changes on rs6000.md to a
>> seperate patch.
>> 
>> This patch pass boostrap and regtest on ppc64le(includes p10).
>> Is it ok for trunk?
>> 
>> BR,
>> Jeff(Jiufu)
>> 
>> 
>>      PR target/106550
>> 
>> gcc/ChangeLog:
>> 
>>      * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add 'pli' for
>>      constant building.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>      * gcc.target/powerpc/pr106550.c: New test.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.cc                 | 39 +++++++++++++++++++++
>>  gcc/testsuite/gcc.target/powerpc/pr106550.c | 14 ++++++++
>>  2 files changed, 53 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c
>> 
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index df491bee2ea..1ccb2ff30a1 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -10181,6 +10181,45 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
>> c)
>>                      gen_rtx_IOR (DImode, copy_rtx (temp),
>>                                   GEN_INT (ud1)));
>>      }
>> +  else if (TARGET_PREFIXED)
>> +    {
>> +      /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0.  */
>> +      if (can_create_pseudo_p ())
>> +    {
>> +      temp = gen_reg_rtx (DImode);
>> +      rtx temp1 = gen_reg_rtx (DImode);
>> +      emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3));
>> +      emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1));
>> +
>
> Nit: copy_rtx here seems not necessary, as both temp and temp1 are with CODE 
> REG.
> The function copy_rtx returns the given rtx for code REG.
>
>> +      emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1,
>> +                                       GEN_INT (0xffffffff)));
>> +    }
>> +
>> +      /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32.  */
>> +      else
>> +    {
>> +      emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3));
>> +
>> +      emit_move_insn (copy_rtx (dest),
>> +                      gen_rtx_ASHIFT (DImode, copy_rtx (dest),
>> +                                      GEN_INT (32)));
>> +
>> +      bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO;
>> +
>
> The REGNO usage has asserted dest is with CODE REG, if it's always true
> I don't see why we need copy_rtx around.  Or do I miss something?
Thanks a lot for point this out!
Yes, now rs6000_emit_set_long_const is called on DImode for constant
splitter; and it should be always with CODE REG, and then copy_rtx would
not be needed here! 
I would update the patch accordingly.

>
>> +      /* Use paddi for the low32 bits.  */
>> +      if (ud2 != 0 && ud1 != 0 && can_use_paddi)
>> +        emit_move_insn (dest, gen_rtx_PLUS (DImode, copy_rtx (dest),
>> +                                            GEN_INT ((ud2 << 16) | ud1)));
>> +      /* Use oris, ori for low32 bits.  */
>> +      if (ud2 != 0 && (ud1 == 0 || !can_use_paddi))
>> +        emit_move_insn (ud1 != 0 ? copy_rtx (dest) : dest,
>> +                        gen_rtx_IOR (DImode, copy_rtx (dest),
>> +                                     GEN_INT (ud2 << 16)));
>> +      if (ud1 != 0 && (ud2 == 0 || !can_use_paddi))
>> +        emit_move_insn (dest, gen_rtx_IOR (DImode, copy_rtx (dest),
>> +                                           GEN_INT (ud1)));
>> +    }
>> +    }
>>    else
>>      {
>>        temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr106550.c
>> new file mode 100644
>> index 00000000000..c6f4116bb9a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
>> @@ -0,0 +1,14 @@
>> +/* PR target/106550 */
>> +/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */
>> +
>
> Need to check power10_ok, like:
> /* { dg-require-effective-target power10_ok } */
>
> Nit: -std=c99 is not needed?
Thanks for catching this!


BR,
Jeff(Jiufu)
>
> BR,
> Kewen

Reply via email to