On Tue, 22 Oct 2024, Jennifer Schmitz wrote:
>
>
> > On 22 Oct 2024, at 11:05, Richard Biener <[email protected]> wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, 22 Oct 2024, Jennifer Schmitz wrote:
> >
> >>
> >>
> >>> On 21 Oct 2024, at 10:51, Richard Biener <[email protected]> wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Fri, 18 Oct 2024, Jennifer Schmitz wrote:
> >>>
> >>>> This patch adds the following two simplifications in match.pd:
> >>>> - pow (1.0/x, y) to pow (x, -y), avoiding the division
> >>>> - pow (0.0, x) to 0.0, avoiding the call to pow.
> >>>> The patterns are guarded by flag_unsafe_math_optimizations,
> >>>> !flag_trapping_math, !flag_errno_math, !HONOR_SIGNED_ZEROS,
> >>>> and !HONOR_INFINITIES.
> >>>>
> >>>> Tests were added to confirm the application of the transform for float,
> >>>> double, and long double.
> >>>>
> >>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and
> >>>> x86_64-linux-gnu, no regression.
> >>>> OK for mainline?
> >>>>
> >>>> Signed-off-by: Jennifer Schmitz <[email protected]>
> >>>>
> >>>> gcc/
> >>>> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and
> >>>> pow (0.0, x) -> 0.0.
> >>>>
> >>>> gcc/testsuite/
> >>>> * gcc.dg/tree-ssa/pow_fold_1.c: New test.
> >>>> ---
> >>>> gcc/match.pd | 14 +++++++++
> >>>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 34 ++++++++++++++++++++++
> >>>> 2 files changed, 48 insertions(+)
> >>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> >>>>
> >>>> diff --git a/gcc/match.pd b/gcc/match.pd
> >>>> index 12d81fcac0d..ba100b117e7 100644
> >>>> --- a/gcc/match.pd
> >>>> +++ b/gcc/match.pd
> >>>> @@ -8203,6 +8203,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>>> (rdiv @0 (exps:s @1))
> >>>> (mult @0 (exps (negate @1)))))
> >>>>
> >>>> + /* Simplify pow(1.0/x, y) into pow(x, -y). */
> >>>> + (if (! HONOR_INFINITIES (type)
> >>>> + && ! HONOR_SIGNED_ZEROS (type)
> >>>> + && ! flag_trapping_math
> >>>> + && ! flag_errno_math)
> >>>> + (simplify
> >>>> + (POW (rdiv:s real_onep@0 @1) @2)
> >>>> + (POW @1 (negate @2)))
> >>>
> >>> This one shouldn't need HONOR_SIGNED_ZEROS?
> >>>
> >>>> +
> >>>> + /* Simplify pow(0.0, x) into 0.0. */
> >>>> + (simplify
> >>>> + (POW real_zerop@0 @1)
> >>>
> >>> I think this needs !HONOR_NANS (type)?
> >>>
> >>> Otherwise OK.
> >> Thanks for the feedback, Richard and Andrew. I made the following changes
> >> to the patch (current version of the patch below):
> >> - also applied the pattern to POWI and added tests for pow, powif, powil
> >> - not gate first pattern under !HONOR_SIGNED_ZEROS, but second one
> >> additionally under !HONOR_NANS (type)
> >> - added tests for powf16
> >
> > Note powi is GCC internal, it doesn't set errno and it should be subject
> > to different rules - I'd rather have patterns working on powi separate.
> How about moving the patterns for POWI into the section
> flag_unsafe_math_optimizations && canonicalize_math_p () and not use
> (!flag_errno_math)?
Sounds good.
> >
> >> Now, I am encountering two problems:
> >>
> >> First, the transform is not applied for float16 (even if
> >> -fexcess-precision=16). Do you know what the problem could be?
> >
> > I think you want to use POW_ALL instead of POW. The generated
> > cfn-operators.pd shows
> >
> > (define_operator_list POW
> > BUILT_IN_POWF
> > BUILT_IN_POW
> > BUILT_IN_POWL
> > IFN_POW)
> > (define_operator_list POW_FN
> > BUILT_IN_POWF16
> > BUILT_IN_POWF32
> > BUILT_IN_POWF64
> > BUILT_IN_POWF128
> > BUILT_IN_POWF32X
> > BUILT_IN_POWF64X
> > BUILT_IN_POWF128X
> > null)
> > (define_operator_list POW_ALL
> > BUILT_IN_POWF
> > BUILT_IN_POW
> > BUILT_IN_POWL
> > BUILT_IN_POWF16
> > ...
> >
> > note this comes at expense of more generated code (in
> > gimple/generic-match.pd).
> Thanks, that solved the Float16 issue.
> >
> >> Second, validation on aarch64 shows a regression in tests
> >> - gcc.dg/recip_sqrt_mult_1.c and
> >> - gcc.dg/recip_sqrt_mult_5.c,
> >> because the pattern (POWI(1/x, y) -> POWI(x, -y)) is applied before the
> >> recip pass and prevents application of the recip-patterns. The reason for
> >> this might be that the single-use restriction only work if the integer
> >> argument is non-constant, but in the failing test cases, the integer
> >> argument is 2 and the pattern is applied despite the :s flag.
> >> For example, my pattern is **not** applied (single-use restriction works)
> >> for:
> >> double res, res2;
> >> void foo (double a, int b)
> >> {
> >> double f (double);
> >> double t1 = 1.0 / a;
> >> res = __builtin_powi (t1, b);
> >> res2 = f (t1);
> >> }
> >>
> >> But the pattern **is** applied and single-use restriction does **not**
> >> work for:
> >> double res, res2;
> >> void foo (double a)
> >> {
> >> double f (double);
> >> double t1 = 1.0 / a;
> >> res = __builtin_powi (t1, 2);
> >> res2 = f (t1);
> >> }
> >
> > This must be because the result is a single operation. :s only applies
> > when the result has sub-expresions. This is to make CSE work.
> > The "fix" is to add explicit && single_use (@n) to override that
> > behavior. Note that I think the transform is good even when the
> > division is used because the result reduces the dependence chain length.
> > It's only when @2 is non-constant that we're introducing another
> > stmt for the negation that re-introduces this latency (even if in
> > practice it would be smaller).
> >
> >> Possible options to resolve this are:
> >> - gate pattern to run after recip pass
> >> - do not apply pattern for POWI
> >
> > - adjust the testcase (is the final outcome still good?)
> Without the patch, there is one fdiv instruction less (below is the assembly
> for recip_sqrt_mult_1.c, but for _5.c it’s analogous):
> No patch or with single_use of the result of the division:
> foo:
> fmov d30, 1.0e+0
> fsqrt d31, d0
> adrp x0, .LANCHOR0
> add x1, x0, :lo12:.LANCHOR0
> fdiv d30, d30, d0
> fmul d0, d31, d30
> str d0, [x0, #:lo12:.LANCHOR0]
> stp d30, d31, [x1, 8]
> ret
>
> With patch:
> foo:
> fsqrt d31, d0
> fmov d30, 1.0e+0
> adrp x1, .LANCHOR0
> add x0, x1, :lo12:.LANCHOR0
> fdiv d31, d30, d31
> fdiv d30, d30, d0
> str d31, [x1, #:lo12:.LANCHOR0]
> fmul d31, d31, d0
> stp d30, d31, [x0, 8]
> ret
> So, we might want to use the single_use guard.
Yeah, this is because the powi inline expansion will add back
the division.
Richard.
> Best,
> Jennifer
>
> >
> >> What are your thoughts on this?
> >> Thanks,
> >> Jennifer
> >>
> >> This patch adds the following two simplifications in match.pd for POW
> >> and POWI:
> >> - pow (1.0/x, y) to pow (x, -y), avoiding the division
> >> - pow (0.0, x) to 0.0, avoiding the call to pow.
> >> The patterns are guarded by flag_unsafe_math_optimizations,
> >> !flag_trapping_math, !flag_errno_math, and !HONOR_INFINITIES.
> >> The second pattern is also guarded by !HONOR_NANS and
> >> !HONOR_SIGNED_ZEROS.
> >>
> >> Tests were added to confirm the application of the transform for
> >> builtins pow, powf, powl, powi, powif, powil, and powf16.
> >>
> >> The patch was bootstrapped and regtested on aarch64-linux-gnu and
> >> x86_64-linux-gnu, no regression.
> >> OK for mainline?
> >>
> >> Signed-off-by: Jennifer Schmitz <[email protected]>
> >>
> >> gcc/
> >> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and
> >> pow (0.0, x) -> 0.0.
> >>
> >> gcc/testsuite/
> >> * gcc.dg/tree-ssa/pow_fold_1.c: New test.
> >> ---
> >> gcc/match.pd | 15 ++++++++
> >> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 42 ++++++++++++++++++++++
> >> 2 files changed, 57 insertions(+)
> >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> >>
> >> diff --git a/gcc/match.pd b/gcc/match.pd
> >> index 12d81fcac0d..b061ef9dc91 100644
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -8203,6 +8203,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >> (rdiv @0 (exps:s @1))
> >> (mult @0 (exps (negate @1)))))
> >>
> >> + (for pow (POW POWI)
> >> + (if (! HONOR_INFINITIES (type)
> >> + && ! flag_trapping_math
> >> + && ! flag_errno_math)
> >> + /* Simplify pow(1.0/x, y) into pow(x, -y). */
> >> + (simplify
> >> + (pow (rdiv:s real_onep@0 @1) @2)
> >> + (pow @1 (negate @2)))
> >> +
> >> + /* Simplify pow(0.0, x) into 0.0. */
> >> + (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type))
> >> + (simplify
> >> + (pow real_zerop@0 @1)
> >> + @0))))
> >> +
> >> (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type)
> >> && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type)
> >> && ! flag_trapping_math
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> >> b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> >> new file mode 100644
> >> index 00000000000..c38b7390478
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> >> @@ -0,0 +1,42 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
> >> +/* { dg-add-options float16 } */
> >> +/* { dg-require-effective-target float16_runtime } */
> >> +/* { dg-require-effective-target c99_runtime } */
> >> +
> >> +extern void link_error (void);
> >> +
> >> +#define POW1OVER(TYPE1, TYPE2, CTY, TY) \
> >> + void \
> >> + pow1over_##TY (TYPE1 x, TYPE2 y) \
> >> + { \
> >> + TYPE1 t1 = 1.0##CTY / x; \
> >> + TYPE1 t2 = __builtin_pow##TY (t1, y); \
> >> + TYPE2 t3 = -y; \
> >> + TYPE1 t4 = __builtin_pow##TY (x, t3); \
> >> + if (t2 != t4) \
> >> + link_error (); \
> >> + } \
> >> +
> >> +#define POW0(TYPE1, TYPE2, CTY, TY) \
> >> + void \
> >> + pow0_##TY (TYPE2 x) \
> >> + { \
> >> + TYPE1 t1 = __builtin_pow##TY (0.0##CTY, x); \
> >> + if (t1 != 0.0##CTY) \
> >> + link_error (); \
> >> + } \
> >> +
> >> +#define TEST_ALL(TYPE1, TYPE2, CTY, TY) \
> >> + POW1OVER (TYPE1, TYPE2, CTY, TY) \
> >> + POW0 (TYPE1, TYPE2, CTY, TY)
> >> +
> >> +TEST_ALL (double, double, , )
> >> +TEST_ALL (float, float, f, f)
> >> +TEST_ALL (_Float16, _Float16, f16, f16)
> >> +TEST_ALL (long double, long double, L, l)
> >> +TEST_ALL (double, int, , i)
> >> +TEST_ALL (float, int, f, if)
> >> +TEST_ALL (long double, int, L, il)
> >> +
> >> +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */
> >>
> >
> > --
> > Richard Biener <[email protected]>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)