"Kewen.Lin" <li...@linux.ibm.com> writes: > Hi Jeff, > > Thanks for the patch, some comments on nits are inline. > > on 2022/9/1 11:24, Jiufu Guo wrote: >> Hi, >> >> As mentioned in PR106550, since pli could support 34bits immediate, we could >> use less instructions(3insn would be ok) to build 64bits constant with pli. >> >> For example, for constant 0x020805006106003, we could generate it with: >> asm code1: >> pli 9,101736451 (0x6106003) >> sldi 9,9,32 >> paddi 9,9, 2130000 (0x0208050) >> >> or asm code2: >> pli 10, 2130000 >> pli 9, 101736451 >> rldimi 9, 10, 32, 0 >> >> Testing with simple cases as below, run them a lot of times: >> f1.c >> long __attribute__ ((noinline)) foo (long *arg,long *,long*) >> { >> *arg = 0x2351847027482577; >> } >> 5insns: base >> pli+sldi+paddi: similar -0.08% >> pli+pli+rldimi: faster +0.66% >> >> f2.c >> long __attribute__ ((noinline)) foo (long *arg, long *arg2, long *arg3) >> { >> *arg = 0x2351847027482577; >> *arg2 = 0x3257845024384680; >> *arg3 = 0x1245abcef9240dec; >> } >> 5nisns: base >> pli+sldi+paddi: faster +1.35% >> pli+pli+rldimi: faster +5.49% >> >> f2.c would be more meaningful. Because 'sched passes' are effective for >> f2.c, but 'scheds' do less thing for f1.c. >> >> Compare with previous patch: >> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599525.html >> This one updates code slightly and extracts changes on rs6000.md to a >> seperate patch. >> >> This patch pass boostrap and regtest on ppc64le(includes p10). >> Is it ok for trunk? >> >> BR, >> Jeff(Jiufu) >> >> >> PR target/106550 >> >> gcc/ChangeLog: >> >> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add 'pli' for >> constant building. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr106550.c: New test. >> >> --- >> gcc/config/rs6000/rs6000.cc | 39 +++++++++++++++++++++ >> gcc/testsuite/gcc.target/powerpc/pr106550.c | 14 ++++++++ >> 2 files changed, 53 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index df491bee2ea..1ccb2ff30a1 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -10181,6 +10181,45 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT >> c) >> gen_rtx_IOR (DImode, copy_rtx (temp), >> GEN_INT (ud1))); >> } >> + else if (TARGET_PREFIXED) >> + { >> + /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0. */ >> + if (can_create_pseudo_p ()) >> + { >> + temp = gen_reg_rtx (DImode); >> + rtx temp1 = gen_reg_rtx (DImode); >> + emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3)); >> + emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1)); >> + > > Nit: copy_rtx here seems not necessary, as both temp and temp1 are with CODE > REG. > The function copy_rtx returns the given rtx for code REG. > >> + emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1, >> + GEN_INT (0xffffffff))); >> + } >> + >> + /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32. */ >> + else >> + { >> + emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3)); >> + >> + emit_move_insn (copy_rtx (dest), >> + gen_rtx_ASHIFT (DImode, copy_rtx (dest), >> + GEN_INT (32))); >> + >> + bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO; >> + > > The REGNO usage has asserted dest is with CODE REG, if it's always true > I don't see why we need copy_rtx around. Or do I miss something? Thanks a lot for point this out! Yes, now rs6000_emit_set_long_const is called on DImode for constant splitter; and it should be always with CODE REG, and then copy_rtx would not be needed here! I would update the patch accordingly.
> >> + /* Use paddi for the low32 bits. */ >> + if (ud2 != 0 && ud1 != 0 && can_use_paddi) >> + emit_move_insn (dest, gen_rtx_PLUS (DImode, copy_rtx (dest), >> + GEN_INT ((ud2 << 16) | ud1))); >> + /* Use oris, ori for low32 bits. */ >> + if (ud2 != 0 && (ud1 == 0 || !can_use_paddi)) >> + emit_move_insn (ud1 != 0 ? copy_rtx (dest) : dest, >> + gen_rtx_IOR (DImode, copy_rtx (dest), >> + GEN_INT (ud2 << 16))); >> + if (ud1 != 0 && (ud2 == 0 || !can_use_paddi)) >> + emit_move_insn (dest, gen_rtx_IOR (DImode, copy_rtx (dest), >> + GEN_INT (ud1))); >> + } >> + } >> else >> { >> temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c >> b/gcc/testsuite/gcc.target/powerpc/pr106550.c >> new file mode 100644 >> index 00000000000..c6f4116bb9a >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c >> @@ -0,0 +1,14 @@ >> +/* PR target/106550 */ >> +/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */ >> + > > Need to check power10_ok, like: > /* { dg-require-effective-target power10_ok } */ > > Nit: -std=c99 is not needed? Thanks for catching this! BR, Jeff(Jiufu) > > BR, > Kewen