Hi,

"Kewen.Lin" <li...@linux.ibm.com> writes:

> on 2023/10/25 16:14, Jiufu Guo wrote:
>> 
>> Hi,
>> 
>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>> 
>>> Hi,
>>>
>>> on 2023/10/25 10:00, Jiufu Guo wrote:
>>>> Hi,
>>>>
>>>> Sometimes, a complicated constant is built via 3(or more)
>>>> instructions to build. Generally speaking, it would not be
>>>> as faster as loading it from the constant pool (as a few
>>>> discussions in PR63281).
>>>
>>> I may miss some previous discussions, but I'm curious why we
>>> chose ">=3" here, as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c9
>>> which indicates that more than 3 (>3) should be considered
>>> with this change.
>> 
>> Thanks a lot for your great patience for reading the history!
>> Yes, there are some discussions about "> 3" vs. "> 2".
>> - In theory, "ld" is one instruction.  If consider "address/toc"
>>   adjust, we may count it as 2 instructions. "pld" may need less
>>   cycles.
>
> OK, even without prefixed insn support, the high part of address
> computation could be optimized as nop by linker further.  It would
> be good to say something on this in commit log, otherwise people
> may be confused as the PR comment mentioned above.
>
>> - As test, it seems "> 2" could get better/stable runtime result
>>   during testing SPEC2017.
>
> Ok, if you posted the conclusion previously, it would be good to
> mention it here with a link on the result comparisons.

Thanks a lot for your sugguestions.

>
>> 
>>>
>>>>
>>>> For the concern that I raised in:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599676.html
>>>> The micro-cases would not be the major concern. Because as
>>>> Segher explained in:
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c18
>>>> It would just be about the benchmark method.
>>>>
>>>> As tested on spec2017, for visible performance changes, we
>>>> can find the runtime improvement on 500.perlbench_r about
>>>> ~1.8% (-O2) when support loading complicates constant from
>>>> constant pool. And no visible performance recession on
>>>> other benchmarks.
>>>
>>> The improvement on 500.perlbench_r looks to match what PR63281
>>> mentioned, nice!  I'm curious that which options and which kinds
>>> of CPUs have you tested with?  Since this is a general change,
>>> I'd expect we can test with P8/P9/P10 at O2/O3 (or Ofast) at
>>> least.
>> 
>> Great advice! Thanks for pointing this!
>> A few months ago, P8/P9/P10 are tested. While this time, I rerun
>> SPEC2017 on P10 for -O2 and -O3.  More test on latest code would
>> be better.
>
> Was it tested previously with your recent commits on constant
> building together?  or just with the trunk at that time?  Anyway,

Just with the trunk at that time.

> I was curious how it's tested, thanks for replying, good to see
> those are covered.  :)  I'd leave the further review to Segher and
> David.

Thanks again.

BR,
Jeff (Jiufu Guo)

>
> BR,
> Kewen
>
>> 
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>>>
>>> BR,
>>> Kewen
>>>
>>>>
>>>> Boostrap & regtest pass on ppc64{,le}.
>>>> Is this ok for trunk?
>>>>
>>>> BR,
>>>> Jeff (Jiufu Guo)
>>>>
>>>>    PR target/63281
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>    * config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
>>>>    complicate constant to memory.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>    * gcc.target/powerpc/const_anchors.c: Update to test final-rtl. 
>>>>    * gcc.target/powerpc/parall_5insn_const.c: Update to keep original test
>>>>    point.
>>>>    * gcc.target/powerpc/pr106550.c: Likewise..
>>>>    * gcc.target/powerpc/pr106550_1.c: Likewise.
>>>>    * gcc.target/powerpc/pr87870.c: Update according to latest behavior.
>>>>    * gcc.target/powerpc/pr93012.c: Likewise.
>>>>
>>>> ---
>>>>  gcc/config/rs6000/rs6000.cc                     | 16 ++++++++++++++++
>>>>  .../gcc.target/powerpc/const_anchors.c          |  5 ++---
>>>>  .../gcc.target/powerpc/parall_5insn_const.c     | 14 ++++++++++++--
>>>>  gcc/testsuite/gcc.target/powerpc/pr106550.c     | 17 +++++++++++++++--
>>>>  gcc/testsuite/gcc.target/powerpc/pr106550_1.c   | 15 +++++++++++++--
>>>>  gcc/testsuite/gcc.target/powerpc/pr87870.c      |  5 ++++-
>>>>  gcc/testsuite/gcc.target/powerpc/pr93012.c      |  4 +++-
>>>>  7 files changed, 65 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>>> index 4690384cdbe..b9562f1ea0f 100644
>>>> --- a/gcc/config/rs6000/rs6000.cc
>>>> +++ b/gcc/config/rs6000/rs6000.cc
>>>> @@ -10292,6 +10292,22 @@ rs6000_emit_set_const (rtx dest, rtx source)
>>>>      c = sext_hwi (c, 32);
>>>>      emit_move_insn (lo, GEN_INT (c));
>>>>    }
>>>> +
>>>> +      /* If it can be stored to the constant pool and profitable.  */
>>>> +      else if (base_reg_operand (dest, mode)
>>>> +         && num_insns_constant (source, mode) > 2)
>>>> +  {
>>>> +    rtx sym = force_const_mem (mode, source);
>>>> +    if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
>>>> +        && use_toc_relative_ref (XEXP (sym, 0), mode))
>>>> +      {
>>>> +        rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
>>>> +        sym = gen_const_mem (mode, toc);
>>>> +        set_mem_alias_set (sym, get_TOC_alias_set ());
>>>> +      }
>>>> +
>>>> +    emit_insn (gen_rtx_SET (dest, sym));
>>>> +  }
>>>>        else
>>>>    rs6000_emit_set_long_const (dest, c);
>>>>        break;
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
>>>> b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>>>> index 542e2674b12..188744165f2 100644
>>>> --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>>>> @@ -1,5 +1,5 @@
>>>>  /* { dg-do compile { target has_arch_ppc64 } } */
>>>> -/* { dg-options "-O2" } */
>>>> +/* { dg-options "-O2 -fdump-rtl-final" } */
>>>>  
>>>>  #define C1 0x2351847027482577ULL
>>>>  #define C2 0x2351847027482578ULL
>>>> @@ -16,5 +16,4 @@ void __attribute__ ((noinline)) foo1 (long long *a, long 
>>>> long b)
>>>>    if (b)
>>>>      *a++ = C2;
>>>>  }
>>>> -
>>>> -/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
>>>> +/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c 
>>>> b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
>>>> index e3a9a7264cf..df0690b90be 100644
>>>> --- a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
>>>> @@ -9,8 +9,18 @@
>>>>  void __attribute__ ((noinline)) foo (unsigned long long *a)
>>>>  {
>>>>    /* 2 lis + 2 ori + 1 rldimi for each constant.  */
>>>> -  *a++ = 0x800aabcdc167fa16ULL;
>>>> -  *a++ = 0x7543a876867f616ULL;
>>>> +  {
>>>> +    register long long d asm("r0") = 0x800aabcdc167fa16ULL;
>>>> +    long long n;
>>>> +    asm("mr %0, %1" : "=r"(n) : "r"(d));
>>>> +    *a++ = n;
>>>> +  }
>>>> +  {
>>>> +    register long long d asm("r0") = 0x7543a876867f616ULL;
>>>> +    long long n;
>>>> +    asm("mr %0, %1" : "=r"(n) : "r"(d));
>>>> +    *a++ = n;
>>>> +  }
>>>>  }
>>>>  
>>>>  long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL};
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c 
>>>> b/gcc/testsuite/gcc.target/powerpc/pr106550.c
>>>> index 74e395331ab..5eca2b2f701 100644
>>>> --- a/gcc/testsuite/gcc.target/powerpc/pr106550.c
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
>>>> @@ -1,12 +1,25 @@
>>>>  /* PR target/106550 */
>>>>  /* { dg-options "-O2 -mdejagnu-cpu=power10" } */
>>>>  /* { dg-require-effective-target power10_ok } */
>>>> +/* { dg-require-effective-target has_arch_ppc64 } */
>>>>  
>>>>  void
>>>>  foo (unsigned long long *a)
>>>>  {
>>>> -  *a++ = 0x020805006106003; /* pli+pli+rldimi */
>>>> -  *a++ = 0x2351847027482577;/* pli+pli+rldimi */  
>>>> +  {
>>>> +    /* pli+pli+rldimi */
>>>> +    register long long d asm("r0") = 0x020805006106003ULL;
>>>> +    long long n;
>>>> +    asm("mr %0, %1" : "=r"(n) : "r"(d));
>>>> +    *a++ = n;
>>>> +  }
>>>> +  {
>>>> +    /* pli+pli+rldimi */  
>>>> +    register long long d asm("r0") = 0x2351847027482577ULL;
>>>> +    long long n;
>>>> +    asm("mr %0, %1" : "=r"(n) : "r"(d));
>>>> +    *a++ = n;
>>>> +  }
>>>>  }
>>>>  
>>>>  /* { dg-final { scan-assembler-times {\mpli\M} 4 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550_1.c 
>>>> b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
>>>> index 7e709fcf9d8..11878d893a4 100644
>>>> --- a/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
>>>> @@ -12,8 +12,19 @@ foo (unsigned long long *a)
>>>>    asm("cntlzd %0, %1" : "=r"(n) : "r"(d));
>>>>    *a++ = n;
>>>>  
>>>> -  *a++ = 0x235a8470a7480000ULL; /* pli+sldi+oris */
>>>> -  *a++ = 0x23a184700000b677ULL; /* pli+sldi+ori */
>>>> +  {
>>>> +    register long long d asm("r0") = 0x235a8470a7480000ULL; /* 
>>>> pli+sldi+oris */
>>>> +    long long n;
>>>> +    asm("cntlzd %0, %1" : "=r"(n) : "r"(d));
>>>> +    *a++ = n;
>>>> +  }
>>>> +
>>>> +  {
>>>> +    register long long d asm("r0") = 0x23a184700000b677ULL; /* 
>>>> pli+sldi+ori */
>>>> +    long long n;
>>>> +    asm("cntlzd %0, %1" : "=r"(n) : "r"(d));
>>>> +    *a++ = n;
>>>> +  }
>>>>  }
>>>>  
>>>>  /* { dg-final { scan-assembler-times {\mpli\M} 3 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr87870.c 
>>>> b/gcc/testsuite/gcc.target/powerpc/pr87870.c
>>>> index d2108ac3386..5fee06744ae 100644
>>>> --- a/gcc/testsuite/gcc.target/powerpc/pr87870.c
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr87870.c
>>>> @@ -25,4 +25,7 @@ test3 (void)
>>>>    return ((__int128)0xdeadbeefcafebabe << 64) | 0xfacefeedbaaaaaad;
>>>>  }
>>>>  
>>>> -/* { dg-final { scan-assembler-not {\mld\M} } } */
>>>> +/* test3 using "ld" to load the value for r3 and r4.
>>>> +   test0, test1 and test2 are using "li".  */
>>>> +/* { dg-final { scan-assembler-times {\mp?ld\M} 2 } } */
>>>> +/* { dg-final { scan-assembler-times {\mli\M} 6 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c 
>>>> b/gcc/testsuite/gcc.target/powerpc/pr93012.c
>>>> index 4f764d0576f..87ce0b49b23 100644
>>>> --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c
>>>> @@ -10,4 +10,6 @@ unsigned long long mskh1() { return 
>>>> 0xffff9234ffff9234ULL; }
>>>>  unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; }
>>>>  unsigned long long mskse() { return 0xffff1234ffff1234ULL; }
>>>>  
>>>> -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */
>>>> +/* { dg-final { scan-assembler-times {\mrldimi\M} 7 { target { 
>>>> has_arch_pwr10 } } } } */
>>>> +/* { dg-final { scan-assembler-times {\mrldimi\M} 3 { target { ! 
>>>> has_arch_pwr10 } } } } */
>>>> +/* { dg-final { scan-assembler-times {\mp?ld\M} 4 { target { ! 
>>>> has_arch_pwr10 } } } } */
>
>
> BR,
> Kewen

Reply via email to