On Tue, 22 Oct 2024, Jennifer Schmitz wrote: > > > > On 22 Oct 2024, at 11:05, Richard Biener <rguent...@suse.de> wrote: > > > > External email: Use caution opening links or attachments > > > > > > On Tue, 22 Oct 2024, Jennifer Schmitz wrote: > > > >> > >> > >>> On 21 Oct 2024, at 10:51, Richard Biener <rguent...@suse.de> wrote: > >>> > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Fri, 18 Oct 2024, Jennifer Schmitz wrote: > >>> > >>>> This patch adds the following two simplifications in match.pd: > >>>> - pow (1.0/x, y) to pow (x, -y), avoiding the division > >>>> - pow (0.0, x) to 0.0, avoiding the call to pow. > >>>> The patterns are guarded by flag_unsafe_math_optimizations, > >>>> !flag_trapping_math, !flag_errno_math, !HONOR_SIGNED_ZEROS, > >>>> and !HONOR_INFINITIES. > >>>> > >>>> Tests were added to confirm the application of the transform for float, > >>>> double, and long double. > >>>> > >>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and > >>>> x86_64-linux-gnu, no regression. > >>>> OK for mainline? > >>>> > >>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> > >>>> > >>>> gcc/ > >>>> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and > >>>> pow (0.0, x) -> 0.0. > >>>> > >>>> gcc/testsuite/ > >>>> * gcc.dg/tree-ssa/pow_fold_1.c: New test. > >>>> --- > >>>> gcc/match.pd | 14 +++++++++ > >>>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 34 ++++++++++++++++++++++ > >>>> 2 files changed, 48 insertions(+) > >>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c > >>>> > >>>> diff --git a/gcc/match.pd b/gcc/match.pd > >>>> index 12d81fcac0d..ba100b117e7 100644 > >>>> --- a/gcc/match.pd > >>>> +++ b/gcc/match.pd > >>>> @@ -8203,6 +8203,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >>>> (rdiv @0 (exps:s @1)) > >>>> (mult @0 (exps (negate @1))))) > >>>> > >>>> + /* Simplify pow(1.0/x, y) into pow(x, -y). */ > >>>> + (if (! HONOR_INFINITIES (type) > >>>> + && ! HONOR_SIGNED_ZEROS (type) > >>>> + && ! flag_trapping_math > >>>> + && ! flag_errno_math) > >>>> + (simplify > >>>> + (POW (rdiv:s real_onep@0 @1) @2) > >>>> + (POW @1 (negate @2))) > >>> > >>> This one shouldn't need HONOR_SIGNED_ZEROS? > >>> > >>>> + > >>>> + /* Simplify pow(0.0, x) into 0.0. */ > >>>> + (simplify > >>>> + (POW real_zerop@0 @1) > >>> > >>> I think this needs !HONOR_NANS (type)? > >>> > >>> Otherwise OK. > >> Thanks for the feedback, Richard and Andrew. I made the following changes > >> to the patch (current version of the patch below): > >> - also applied the pattern to POWI and added tests for pow, powif, powil > >> - not gate first pattern under !HONOR_SIGNED_ZEROS, but second one > >> additionally under !HONOR_NANS (type) > >> - added tests for powf16 > > > > Note powi is GCC internal, it doesn't set errno and it should be subject > > to different rules - I'd rather have patterns working on powi separate. > How about moving the patterns for POWI into the section > flag_unsafe_math_optimizations && canonicalize_math_p () and not use > (!flag_errno_math)?
Sounds good. > > > >> Now, I am encountering two problems: > >> > >> First, the transform is not applied for float16 (even if > >> -fexcess-precision=16). Do you know what the problem could be? > > > > I think you want to use POW_ALL instead of POW. The generated > > cfn-operators.pd shows > > > > (define_operator_list POW > > BUILT_IN_POWF > > BUILT_IN_POW > > BUILT_IN_POWL > > IFN_POW) > > (define_operator_list POW_FN > > BUILT_IN_POWF16 > > BUILT_IN_POWF32 > > BUILT_IN_POWF64 > > BUILT_IN_POWF128 > > BUILT_IN_POWF32X > > BUILT_IN_POWF64X > > BUILT_IN_POWF128X > > null) > > (define_operator_list POW_ALL > > BUILT_IN_POWF > > BUILT_IN_POW > > BUILT_IN_POWL > > BUILT_IN_POWF16 > > ... > > > > note this comes at expense of more generated code (in > > gimple/generic-match.pd). > Thanks, that solved the Float16 issue. > > > >> Second, validation on aarch64 shows a regression in tests > >> - gcc.dg/recip_sqrt_mult_1.c and > >> - gcc.dg/recip_sqrt_mult_5.c, > >> because the pattern (POWI(1/x, y) -> POWI(x, -y)) is applied before the > >> recip pass and prevents application of the recip-patterns. The reason for > >> this might be that the single-use restriction only work if the integer > >> argument is non-constant, but in the failing test cases, the integer > >> argument is 2 and the pattern is applied despite the :s flag. > >> For example, my pattern is **not** applied (single-use restriction works) > >> for: > >> double res, res2; > >> void foo (double a, int b) > >> { > >> double f (double); > >> double t1 = 1.0 / a; > >> res = __builtin_powi (t1, b); > >> res2 = f (t1); > >> } > >> > >> But the pattern **is** applied and single-use restriction does **not** > >> work for: > >> double res, res2; > >> void foo (double a) > >> { > >> double f (double); > >> double t1 = 1.0 / a; > >> res = __builtin_powi (t1, 2); > >> res2 = f (t1); > >> } > > > > This must be because the result is a single operation. :s only applies > > when the result has sub-expresions. This is to make CSE work. > > The "fix" is to add explicit && single_use (@n) to override that > > behavior. Note that I think the transform is good even when the > > division is used because the result reduces the dependence chain length. > > It's only when @2 is non-constant that we're introducing another > > stmt for the negation that re-introduces this latency (even if in > > practice it would be smaller). > > > >> Possible options to resolve this are: > >> - gate pattern to run after recip pass > >> - do not apply pattern for POWI > > > > - adjust the testcase (is the final outcome still good?) > Without the patch, there is one fdiv instruction less (below is the assembly > for recip_sqrt_mult_1.c, but for _5.c it’s analogous): > No patch or with single_use of the result of the division: > foo: > fmov d30, 1.0e+0 > fsqrt d31, d0 > adrp x0, .LANCHOR0 > add x1, x0, :lo12:.LANCHOR0 > fdiv d30, d30, d0 > fmul d0, d31, d30 > str d0, [x0, #:lo12:.LANCHOR0] > stp d30, d31, [x1, 8] > ret > > With patch: > foo: > fsqrt d31, d0 > fmov d30, 1.0e+0 > adrp x1, .LANCHOR0 > add x0, x1, :lo12:.LANCHOR0 > fdiv d31, d30, d31 > fdiv d30, d30, d0 > str d31, [x1, #:lo12:.LANCHOR0] > fmul d31, d31, d0 > stp d30, d31, [x0, 8] > ret > So, we might want to use the single_use guard. Yeah, this is because the powi inline expansion will add back the division. Richard. > Best, > Jennifer > > > > >> What are your thoughts on this? > >> Thanks, > >> Jennifer > >> > >> This patch adds the following two simplifications in match.pd for POW > >> and POWI: > >> - pow (1.0/x, y) to pow (x, -y), avoiding the division > >> - pow (0.0, x) to 0.0, avoiding the call to pow. > >> The patterns are guarded by flag_unsafe_math_optimizations, > >> !flag_trapping_math, !flag_errno_math, and !HONOR_INFINITIES. > >> The second pattern is also guarded by !HONOR_NANS and > >> !HONOR_SIGNED_ZEROS. > >> > >> Tests were added to confirm the application of the transform for > >> builtins pow, powf, powl, powi, powif, powil, and powf16. > >> > >> The patch was bootstrapped and regtested on aarch64-linux-gnu and > >> x86_64-linux-gnu, no regression. > >> OK for mainline? > >> > >> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> > >> > >> gcc/ > >> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and > >> pow (0.0, x) -> 0.0. > >> > >> gcc/testsuite/ > >> * gcc.dg/tree-ssa/pow_fold_1.c: New test. > >> --- > >> gcc/match.pd | 15 ++++++++ > >> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 42 ++++++++++++++++++++++ > >> 2 files changed, 57 insertions(+) > >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c > >> > >> diff --git a/gcc/match.pd b/gcc/match.pd > >> index 12d81fcac0d..b061ef9dc91 100644 > >> --- a/gcc/match.pd > >> +++ b/gcc/match.pd > >> @@ -8203,6 +8203,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> (rdiv @0 (exps:s @1)) > >> (mult @0 (exps (negate @1))))) > >> > >> + (for pow (POW POWI) > >> + (if (! HONOR_INFINITIES (type) > >> + && ! flag_trapping_math > >> + && ! flag_errno_math) > >> + /* Simplify pow(1.0/x, y) into pow(x, -y). */ > >> + (simplify > >> + (pow (rdiv:s real_onep@0 @1) @2) > >> + (pow @1 (negate @2))) > >> + > >> + /* Simplify pow(0.0, x) into 0.0. */ > >> + (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type)) > >> + (simplify > >> + (pow real_zerop@0 @1) > >> + @0)))) > >> + > >> (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type) > >> && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type) > >> && ! flag_trapping_math > >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c > >> b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c > >> new file mode 100644 > >> index 00000000000..c38b7390478 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c > >> @@ -0,0 +1,42 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */ > >> +/* { dg-add-options float16 } */ > >> +/* { dg-require-effective-target float16_runtime } */ > >> +/* { dg-require-effective-target c99_runtime } */ > >> + > >> +extern void link_error (void); > >> + > >> +#define POW1OVER(TYPE1, TYPE2, CTY, TY) \ > >> + void \ > >> + pow1over_##TY (TYPE1 x, TYPE2 y) \ > >> + { \ > >> + TYPE1 t1 = 1.0##CTY / x; \ > >> + TYPE1 t2 = __builtin_pow##TY (t1, y); \ > >> + TYPE2 t3 = -y; \ > >> + TYPE1 t4 = __builtin_pow##TY (x, t3); \ > >> + if (t2 != t4) \ > >> + link_error (); \ > >> + } \ > >> + > >> +#define POW0(TYPE1, TYPE2, CTY, TY) \ > >> + void \ > >> + pow0_##TY (TYPE2 x) \ > >> + { \ > >> + TYPE1 t1 = __builtin_pow##TY (0.0##CTY, x); \ > >> + if (t1 != 0.0##CTY) \ > >> + link_error (); \ > >> + } \ > >> + > >> +#define TEST_ALL(TYPE1, TYPE2, CTY, TY) \ > >> + POW1OVER (TYPE1, TYPE2, CTY, TY) \ > >> + POW0 (TYPE1, TYPE2, CTY, TY) > >> + > >> +TEST_ALL (double, double, , ) > >> +TEST_ALL (float, float, f, f) > >> +TEST_ALL (_Float16, _Float16, f16, f16) > >> +TEST_ALL (long double, long double, L, l) > >> +TEST_ALL (double, int, , i) > >> +TEST_ALL (float, int, f, if) > >> +TEST_ALL (long double, int, L, il) > >> + > >> +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */ > >> > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)