Hi Pat, > On 6/27/23 1:52 PM, Pat Haugen via Gcc-patches wrote: >> Updated from prior version to address review comments (update >> rs6000_rtx_cost, >> update scan strings of mod-1.c/mod-2.c)l. >> >> Disable generation of scalar modulo instructions. >> >> It was recently discovered that the scalar modulo instructions can suffer >> noticeable performance issues for certain input values. This patch disables >> their generation since the equivalent div/mul/sub sequence does not suffer >> the same problem. >> >> Bootstrapped and regression tested on powerpc64/powerpc64le. >> Ok for master and backports after burn in? >> >> -Pat >> >> >> 2023-06-27 Pat Haugen <pthau...@linux.ibm.com> >> >> gcc/ >> * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling >> scalar modulo. >> * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. >> * config/rs6000/rs6000.md (mod<mode>3, *mod<mode>3): Disable. >> (define_expand umod<mode>3): New. >> (define_insn umod<mode>3): Rename to *umod<mode>3 and disable. >> (umodti3, modti3): Disable. >> >> gcc/testsuite/ >> * gcc.target/powerpc/clone1.c: Add xfails. >> * gcc.target/powerpc/clone3.c: Likewise. >> * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails. >> * gcc.target/powerpc/mod-2.c: Likewise. >> * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails. >> > > Attaching patch since my mailer apparently messed up some formatting again. > > -Pat > > no-modulo.diff > > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 546c353029b..2dae217bf64 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -22127,7 +22127,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int > outer_code, > *total = rs6000_cost->divsi; > } > /* Add in shift and subtract for MOD unless we have a mod instruction. > */ > - if (!TARGET_MODULO && (code == MOD || code == UMOD)) > + if ((!TARGET_MODULO > + || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode))) > + && (code == MOD || code == UMOD)) > *total += COSTS_N_INSNS (2); > return false; > > diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h > index 3503614efbd..22595f6ebd7 100644 > --- a/gcc/config/rs6000/rs6000.h > +++ b/gcc/config/rs6000/rs6000.h > @@ -2492,3 +2492,9 @@ while (0) > rs6000_asm_output_opcode (STREAM); \ > } > \ > while (0) > + > +/* Disable generation of scalar modulo instructions due to performance issues > + with certain input values. This can be removed in the future when the > + issues have been resolved. */ > +#define RS6000_DISABLE_SCALAR_MODULO 1 > + > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index b0db8ae508d..6c2f237a539 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -3421,6 +3421,17 @@ (define_expand "mod<mode>3" > FAIL; > > operands[2] = force_reg (<MODE>mode, operands[2]); > + > + if (RS6000_DISABLE_SCALAR_MODULO) > + { > + temp1 = gen_reg_rtx (<MODE>mode); > + temp2 = gen_reg_rtx (<MODE>mode); > + > + emit_insn (gen_div<mode>3 (temp1, operands[1], operands[2])); > + emit_insn (gen_mul<mode>3 (temp2, temp1, operands[2])); > + emit_insn (gen_sub<mode>3 (operands[0], operands[1], temp2)); > + DONE; > + } > } > else > { > @@ -3440,17 +3451,42 @@ (define_insn "*mod<mode>3" > [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r") > (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") > (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] > - "TARGET_MODULO" > + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" > "mods<wd> %0,%1,%2" > [(set_attr "type" "div") > (set_attr "size" "<bits>")]) > > +;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is > +;; removed. > +(define_expand "umod<mode>3" > + [(set (match_operand:GPR 0 "gpc_reg_operand") > + (umod:GPR (match_operand:GPR 1 "gpc_reg_operand") > + (match_operand:GPR 2 "gpc_reg_operand")))] > + "" > +{ > + rtx temp1; > + rtx temp2; > + > + if (!TARGET_MODULO) > + FAIL; >
Nit: Sorry that I didn't catch this in the previous review, it seems we can just put TARGET_MODULO in the condition of define_expand. Besides, it seems the declarations of temp1 and temp2 can be merged into their assignments below. The others look pretty good to me, thanks! As I know this patch is still on Segher's list to review, I'd defer the final approval to Segher. BR, Kewen