Hi Pat,

> On 6/27/23 1:52 PM, Pat Haugen via Gcc-patches wrote:
>> Updated from prior version to address review comments (update 
>> rs6000_rtx_cost,
>> update scan strings of mod-1.c/mod-2.c)l.
>>
>> Disable generation of scalar modulo instructions.
>>
>> It was recently discovered that the scalar modulo instructions can suffer
>> noticeable performance issues for certain input values. This patch disables
>> their generation since the equivalent div/mul/sub sequence does not suffer
>> the same problem.
>>
>> Bootstrapped and regression tested on powerpc64/powerpc64le.
>> Ok for master and backports after burn in?
>>
>> -Pat
>>
>>
>> 2023-06-27  Pat Haugen  <pthau...@linux.ibm.com>
>>
>> gcc/
>>      * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
>>      scalar modulo.
>>      * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
>>      * config/rs6000/rs6000.md (mod<mode>3, *mod<mode>3): Disable.
>>      (define_expand umod<mode>3): New.
>>      (define_insn umod<mode>3): Rename to *umod<mode>3 and disable.
>>      (umodti3, modti3): Disable.
>>
>> gcc/testsuite/
>>      * gcc.target/powerpc/clone1.c: Add xfails.
>>      * gcc.target/powerpc/clone3.c: Likewise.
>>      * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
>>      * gcc.target/powerpc/mod-2.c: Likewise.
>>      * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
>>
> 
> Attaching patch since my mailer apparently messed up some formatting again.
> 
> -Pat
> 
> no-modulo.diff
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 546c353029b..2dae217bf64 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -22127,7 +22127,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> outer_code,
>           *total = rs6000_cost->divsi;
>       }
>        /* Add in shift and subtract for MOD unless we have a mod instruction. 
> */
> -      if (!TARGET_MODULO && (code == MOD || code == UMOD))
> +      if ((!TARGET_MODULO
> +        || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
> +      && (code == MOD || code == UMOD))
>       *total += COSTS_N_INSNS (2);
>        return false;
>  
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 3503614efbd..22595f6ebd7 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2492,3 +2492,9 @@ while (0)
>         rs6000_asm_output_opcode (STREAM);                            \
>      }                                                                        
> \
>    while (0)
> +
> +/* Disable generation of scalar modulo instructions due to performance issues
> +   with certain input values.  This can be removed in the future when the
> +   issues have been resolved.  */
> +#define RS6000_DISABLE_SCALAR_MODULO 1
> +
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index b0db8ae508d..6c2f237a539 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -3421,6 +3421,17 @@ (define_expand "mod<mode>3"
>       FAIL;
>  
>        operands[2] = force_reg (<MODE>mode, operands[2]);
> +
> +      if (RS6000_DISABLE_SCALAR_MODULO)
> +     {
> +       temp1 = gen_reg_rtx (<MODE>mode);
> +       temp2 = gen_reg_rtx (<MODE>mode);
> +
> +       emit_insn (gen_div<mode>3 (temp1, operands[1], operands[2]));
> +       emit_insn (gen_mul<mode>3 (temp2, temp1, operands[2]));
> +       emit_insn (gen_sub<mode>3 (operands[0], operands[1], temp2));
> +       DONE;
> +     }
>      }
>    else
>      {
> @@ -3440,17 +3451,42 @@ (define_insn "*mod<mode>3"
>    [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
>          (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
>                (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
> -  "TARGET_MODULO"
> +  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
>    "mods<wd> %0,%1,%2"
>    [(set_attr "type" "div")
>     (set_attr "size" "<bits>")])
>  
> +;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
> +;; removed.
> +(define_expand "umod<mode>3"
> +  [(set (match_operand:GPR 0 "gpc_reg_operand")
> +     (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
> +               (match_operand:GPR 2 "gpc_reg_operand")))]
> +  ""
> +{
> +  rtx temp1;
> +  rtx temp2;
> +
> +  if (!TARGET_MODULO)
> +     FAIL;
>  

Nit: Sorry that I didn't catch this in the previous review, it seems we can
just put TARGET_MODULO in the condition of define_expand.  Besides, it seems
the declarations of temp1 and temp2 can be merged into their assignments
below.

The others look pretty good to me, thanks!  As I know this patch is still on
Segher's list to review, I'd defer the final approval to Segher.

BR,
Kewen

Reply via email to