> On 22 Oct 2024, at 13:14, Richard Biener <rguent...@suse.de> wrote: > > External email: Use caution opening links or attachments > > > On Tue, 22 Oct 2024, Jennifer Schmitz wrote: > >> >> >>> On 22 Oct 2024, at 11:05, Richard Biener <rguent...@suse.de> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> On Tue, 22 Oct 2024, Jennifer Schmitz wrote: >>> >>>> >>>> >>>>> On 21 Oct 2024, at 10:51, Richard Biener <rguent...@suse.de> wrote: >>>>> >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> On Fri, 18 Oct 2024, Jennifer Schmitz wrote: >>>>> >>>>>> This patch adds the following two simplifications in match.pd: >>>>>> - pow (1.0/x, y) to pow (x, -y), avoiding the division >>>>>> - pow (0.0, x) to 0.0, avoiding the call to pow. >>>>>> The patterns are guarded by flag_unsafe_math_optimizations, >>>>>> !flag_trapping_math, !flag_errno_math, !HONOR_SIGNED_ZEROS, >>>>>> and !HONOR_INFINITIES. >>>>>> >>>>>> Tests were added to confirm the application of the transform for float, >>>>>> double, and long double. >>>>>> >>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and >>>>>> x86_64-linux-gnu, no regression. >>>>>> OK for mainline? >>>>>> >>>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >>>>>> >>>>>> gcc/ >>>>>> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and >>>>>> pow (0.0, x) -> 0.0. >>>>>> >>>>>> gcc/testsuite/ >>>>>> * gcc.dg/tree-ssa/pow_fold_1.c: New test. >>>>>> --- >>>>>> gcc/match.pd | 14 +++++++++ >>>>>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 34 ++++++++++++++++++++++ >>>>>> 2 files changed, 48 insertions(+) >>>>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >>>>>> >>>>>> diff --git a/gcc/match.pd b/gcc/match.pd >>>>>> index 12d81fcac0d..ba100b117e7 100644 >>>>>> --- a/gcc/match.pd >>>>>> +++ b/gcc/match.pd >>>>>> @@ -8203,6 +8203,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >>>>>> (rdiv @0 (exps:s @1)) >>>>>> (mult @0 (exps (negate @1))))) >>>>>> >>>>>> + /* Simplify pow(1.0/x, y) into pow(x, -y). */ >>>>>> + (if (! HONOR_INFINITIES (type) >>>>>> + && ! HONOR_SIGNED_ZEROS (type) >>>>>> + && ! flag_trapping_math >>>>>> + && ! flag_errno_math) >>>>>> + (simplify >>>>>> + (POW (rdiv:s real_onep@0 @1) @2) >>>>>> + (POW @1 (negate @2))) >>>>> >>>>> This one shouldn't need HONOR_SIGNED_ZEROS? >>>>> >>>>>> + >>>>>> + /* Simplify pow(0.0, x) into 0.0. */ >>>>>> + (simplify >>>>>> + (POW real_zerop@0 @1) >>>>> >>>>> I think this needs !HONOR_NANS (type)? >>>>> >>>>> Otherwise OK. >>>> Thanks for the feedback, Richard and Andrew. I made the following changes >>>> to the patch (current version of the patch below): >>>> - also applied the pattern to POWI and added tests for pow, powif, powil >>>> - not gate first pattern under !HONOR_SIGNED_ZEROS, but second one >>>> additionally under !HONOR_NANS (type) >>>> - added tests for powf16 >>> >>> Note powi is GCC internal, it doesn't set errno and it should be subject >>> to different rules - I'd rather have patterns working on powi separate. >> How about moving the patterns for POWI into the section >> flag_unsafe_math_optimizations && canonicalize_math_p () and not use >> (!flag_errno_math)? > > Sounds good. > >>> >>>> Now, I am encountering two problems: >>>> >>>> First, the transform is not applied for float16 (even if >>>> -fexcess-precision=16). Do you know what the problem could be? >>> >>> I think you want to use POW_ALL instead of POW. The generated >>> cfn-operators.pd shows >>> >>> (define_operator_list POW >>> BUILT_IN_POWF >>> BUILT_IN_POW >>> BUILT_IN_POWL >>> IFN_POW) >>> (define_operator_list POW_FN >>> BUILT_IN_POWF16 >>> BUILT_IN_POWF32 >>> BUILT_IN_POWF64 >>> BUILT_IN_POWF128 >>> BUILT_IN_POWF32X >>> BUILT_IN_POWF64X >>> BUILT_IN_POWF128X >>> null) >>> (define_operator_list POW_ALL >>> BUILT_IN_POWF >>> BUILT_IN_POW >>> BUILT_IN_POWL >>> BUILT_IN_POWF16 >>> ... >>> >>> note this comes at expense of more generated code (in >>> gimple/generic-match.pd). >> Thanks, that solved the Float16 issue. >>> >>>> Second, validation on aarch64 shows a regression in tests >>>> - gcc.dg/recip_sqrt_mult_1.c and >>>> - gcc.dg/recip_sqrt_mult_5.c, >>>> because the pattern (POWI(1/x, y) -> POWI(x, -y)) is applied before the >>>> recip pass and prevents application of the recip-patterns. The reason for >>>> this might be that the single-use restriction only work if the integer >>>> argument is non-constant, but in the failing test cases, the integer >>>> argument is 2 and the pattern is applied despite the :s flag. >>>> For example, my pattern is **not** applied (single-use restriction works) >>>> for: >>>> double res, res2; >>>> void foo (double a, int b) >>>> { >>>> double f (double); >>>> double t1 = 1.0 / a; >>>> res = __builtin_powi (t1, b); >>>> res2 = f (t1); >>>> } >>>> >>>> But the pattern **is** applied and single-use restriction does **not** >>>> work for: >>>> double res, res2; >>>> void foo (double a) >>>> { >>>> double f (double); >>>> double t1 = 1.0 / a; >>>> res = __builtin_powi (t1, 2); >>>> res2 = f (t1); >>>> } >>> >>> This must be because the result is a single operation. :s only applies >>> when the result has sub-expresions. This is to make CSE work. >>> The "fix" is to add explicit && single_use (@n) to override that >>> behavior. Note that I think the transform is good even when the >>> division is used because the result reduces the dependence chain length. >>> It's only when @2 is non-constant that we're introducing another >>> stmt for the negation that re-introduces this latency (even if in >>> practice it would be smaller). >>> >>>> Possible options to resolve this are: >>>> - gate pattern to run after recip pass >>>> - do not apply pattern for POWI >>> >>> - adjust the testcase (is the final outcome still good?) >> Without the patch, there is one fdiv instruction less (below is the assembly >> for recip_sqrt_mult_1.c, but for _5.c it’s analogous): >> No patch or with single_use of the result of the division: >> foo: >> fmov d30, 1.0e+0 >> fsqrt d31, d0 >> adrp x0, .LANCHOR0 >> add x1, x0, :lo12:.LANCHOR0 >> fdiv d30, d30, d0 >> fmul d0, d31, d30 >> str d0, [x0, #:lo12:.LANCHOR0] >> stp d30, d31, [x1, 8] >> ret >> >> With patch: >> foo: >> fsqrt d31, d0 >> fmov d30, 1.0e+0 >> adrp x1, .LANCHOR0 >> add x0, x1, :lo12:.LANCHOR0 >> fdiv d31, d30, d31 >> fdiv d30, d30, d0 >> str d31, [x1, #:lo12:.LANCHOR0] >> fmul d31, d31, d0 >> stp d30, d31, [x0, 8] >> ret >> So, we might want to use the single_use guard. > > Yeah, this is because the powi inline expansion will add back > the division. Below is the updated patch, I re-validated with no regression on aarch64 and x86_64. Thanks, Jenni
This patch adds the following two simplifications in match.pd for POW_ALL and POWI: - pow (1.0/x, y) to pow (x, -y), avoiding the division - pow (0.0, x) to 0.0, avoiding the call to pow. The patterns are guarded by flag_unsafe_math_optimizations, !flag_trapping_math, and !HONOR_INFINITIES. The POW_ALL patterns are also gated under !flag_errno_math. The second pattern is also guarded by !HONOR_NANS and !HONOR_SIGNED_ZEROS. Tests were added to confirm the application of the transform for builtins pow, powf, powl, powi, powif, powil, and powf16. The patch was bootstrapped and regtested on aarch64-linux-gnu and x86_64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> gcc/ * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and pow (0.0, x) -> 0.0. gcc/testsuite/ * gcc.dg/tree-ssa/pow_fold_1.c: New test. --- gcc/match.pd | 28 +++++++++++++++ gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 42 ++++++++++++++++++++++ 2 files changed, 70 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c diff --git a/gcc/match.pd b/gcc/match.pd index 12d81fcac0d..6d9868d2bb1 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -8203,6 +8203,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (rdiv @0 (exps:s @1)) (mult @0 (exps (negate @1))))) + (for pow (POW_ALL) + (if (! HONOR_INFINITIES (type) + && ! flag_trapping_math + && ! flag_errno_math) + /* Simplify pow(1.0/x, y) into pow(x, -y). */ + (simplify + (pow (rdiv:s real_onep@0 @1) @2) + (pow @1 (negate @2))) + + /* Simplify pow(0.0, x) into 0.0. */ + (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type)) + (simplify + (pow real_zerop@0 @1) + @0)))) + (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type) && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type) && ! flag_trapping_math @@ -8561,6 +8576,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (mult (POW:s @0 @1) (POW:s @2 @1)) (POW (mult @0 @2) @1)) + (if (! HONOR_INFINITIES (type) && ! flag_trapping_math) + /* Simplify powi(1.0/x, y) into powi(x, -y). */ + (simplify + (POWI (rdiv@3 real_onep@0 @1) @2) + (if (single_use (@3)) + (POWI @1 (negate @2)))) + + /* Simplify powi(0.0, x) into 0.0. */ + (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type)) + (simplify + (POWI real_zerop@0 @1) + @0))) + /* Simplify powi(x,y) * powi(z,y) -> powi(x*z,y). */ (simplify (mult (POWI:s @0 @1) (POWI:s @2 @1)) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c new file mode 100644 index 00000000000..d98bcb0827e --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized -fexcess-precision=16" } */ +/* { dg-add-options float16 } */ +/* { dg-require-effective-target float16_runtime } */ +/* { dg-require-effective-target c99_runtime } */ + +extern void link_error (void); + +#define POW1OVER(TYPE1, TYPE2, CTY, TY) \ + void \ + pow1over_##TY (TYPE1 x, TYPE2 y) \ + { \ + TYPE1 t1 = 1.0##CTY / x; \ + TYPE1 t2 = __builtin_pow##TY (t1, y); \ + TYPE2 t3 = -y; \ + TYPE1 t4 = __builtin_pow##TY (x, t3); \ + if (t2 != t4) \ + link_error (); \ + } \ + +#define POW0(TYPE1, TYPE2, CTY, TY) \ + void \ + pow0_##TY (TYPE2 x) \ + { \ + TYPE1 t1 = __builtin_pow##TY (0.0##CTY, x); \ + if (t1 != 0.0##CTY) \ + link_error (); \ + } \ + +#define TEST_ALL(TYPE1, TYPE2, CTY, TY) \ + POW1OVER (TYPE1, TYPE2, CTY, TY) \ + POW0 (TYPE1, TYPE2, CTY, TY) + +TEST_ALL (double, double, , ) +TEST_ALL (float, float, f, f) +TEST_ALL (_Float16, _Float16, f16, f16) +TEST_ALL (long double, long double, L, l) +TEST_ALL (double, int, , i) +TEST_ALL (float, int, f, if) +TEST_ALL (long double, int, L, il) + +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */ -- 2.44.0 > > Richard. > >> Best, >> Jennifer >> >>> >>>> What are your thoughts on this? >>>> Thanks, >>>> Jennifer >>>> >>>> This patch adds the following two simplifications in match.pd for POW >>>> and POWI: >>>> - pow (1.0/x, y) to pow (x, -y), avoiding the division >>>> - pow (0.0, x) to 0.0, avoiding the call to pow. >>>> The patterns are guarded by flag_unsafe_math_optimizations, >>>> !flag_trapping_math, !flag_errno_math, and !HONOR_INFINITIES. >>>> The second pattern is also guarded by !HONOR_NANS and >>>> !HONOR_SIGNED_ZEROS. >>>> >>>> Tests were added to confirm the application of the transform for >>>> builtins pow, powf, powl, powi, powif, powil, and powf16. >>>> >>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and >>>> x86_64-linux-gnu, no regression. >>>> OK for mainline? >>>> >>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >>>> >>>> gcc/ >>>> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and >>>> pow (0.0, x) -> 0.0. >>>> >>>> gcc/testsuite/ >>>> * gcc.dg/tree-ssa/pow_fold_1.c: New test. >>>> --- >>>> gcc/match.pd | 15 ++++++++ >>>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 42 ++++++++++++++++++++++ >>>> 2 files changed, 57 insertions(+) >>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >>>> >>>> diff --git a/gcc/match.pd b/gcc/match.pd >>>> index 12d81fcac0d..b061ef9dc91 100644 >>>> --- a/gcc/match.pd >>>> +++ b/gcc/match.pd >>>> @@ -8203,6 +8203,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >>>> (rdiv @0 (exps:s @1)) >>>> (mult @0 (exps (negate @1))))) >>>> >>>> + (for pow (POW POWI) >>>> + (if (! HONOR_INFINITIES (type) >>>> + && ! flag_trapping_math >>>> + && ! flag_errno_math) >>>> + /* Simplify pow(1.0/x, y) into pow(x, -y). */ >>>> + (simplify >>>> + (pow (rdiv:s real_onep@0 @1) @2) >>>> + (pow @1 (negate @2))) >>>> + >>>> + /* Simplify pow(0.0, x) into 0.0. */ >>>> + (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type)) >>>> + (simplify >>>> + (pow real_zerop@0 @1) >>>> + @0)))) >>>> + >>>> (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type) >>>> && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type) >>>> && ! flag_trapping_math >>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >>>> b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >>>> new file mode 100644 >>>> index 00000000000..c38b7390478 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >>>> @@ -0,0 +1,42 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */ >>>> +/* { dg-add-options float16 } */ >>>> +/* { dg-require-effective-target float16_runtime } */ >>>> +/* { dg-require-effective-target c99_runtime } */ >>>> + >>>> +extern void link_error (void); >>>> + >>>> +#define POW1OVER(TYPE1, TYPE2, CTY, TY) \ >>>> + void \ >>>> + pow1over_##TY (TYPE1 x, TYPE2 y) \ >>>> + { \ >>>> + TYPE1 t1 = 1.0##CTY / x; \ >>>> + TYPE1 t2 = __builtin_pow##TY (t1, y); \ >>>> + TYPE2 t3 = -y; \ >>>> + TYPE1 t4 = __builtin_pow##TY (x, t3); \ >>>> + if (t2 != t4) \ >>>> + link_error (); \ >>>> + } \ >>>> + >>>> +#define POW0(TYPE1, TYPE2, CTY, TY) \ >>>> + void \ >>>> + pow0_##TY (TYPE2 x) \ >>>> + { \ >>>> + TYPE1 t1 = __builtin_pow##TY (0.0##CTY, x); \ >>>> + if (t1 != 0.0##CTY) \ >>>> + link_error (); \ >>>> + } \ >>>> + >>>> +#define TEST_ALL(TYPE1, TYPE2, CTY, TY) \ >>>> + POW1OVER (TYPE1, TYPE2, CTY, TY) \ >>>> + POW0 (TYPE1, TYPE2, CTY, TY) >>>> + >>>> +TEST_ALL (double, double, , ) >>>> +TEST_ALL (float, float, f, f) >>>> +TEST_ALL (_Float16, _Float16, f16, f16) >>>> +TEST_ALL (long double, long double, L, l) >>>> +TEST_ALL (double, int, , i) >>>> +TEST_ALL (float, int, f, if) >>>> +TEST_ALL (long double, int, L, il) >>>> + >>>> +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */ >>>> >>> >>> -- >>> Richard Biener <rguent...@suse.de> >>> SUSE Software Solutions Germany GmbH, >>> Frankenstrasse 146, 90461 Nuernberg, Germany; >>> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) >> >> >> > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
smime.p7s
Description: S/MIME cryptographic signature