> On 22 Oct 2024, at 11:05, Richard Biener <[email protected]> wrote: > > External email: Use caution opening links or attachments > > > On Tue, 22 Oct 2024, Jennifer Schmitz wrote: > >> >> >>> On 21 Oct 2024, at 10:51, Richard Biener <[email protected]> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> On Fri, 18 Oct 2024, Jennifer Schmitz wrote: >>> >>>> This patch adds the following two simplifications in match.pd: >>>> - pow (1.0/x, y) to pow (x, -y), avoiding the division >>>> - pow (0.0, x) to 0.0, avoiding the call to pow. >>>> The patterns are guarded by flag_unsafe_math_optimizations, >>>> !flag_trapping_math, !flag_errno_math, !HONOR_SIGNED_ZEROS, >>>> and !HONOR_INFINITIES. >>>> >>>> Tests were added to confirm the application of the transform for float, >>>> double, and long double. >>>> >>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and >>>> x86_64-linux-gnu, no regression. >>>> OK for mainline? >>>> >>>> Signed-off-by: Jennifer Schmitz <[email protected]> >>>> >>>> gcc/ >>>> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and >>>> pow (0.0, x) -> 0.0. >>>> >>>> gcc/testsuite/ >>>> * gcc.dg/tree-ssa/pow_fold_1.c: New test. >>>> --- >>>> gcc/match.pd | 14 +++++++++ >>>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 34 ++++++++++++++++++++++ >>>> 2 files changed, 48 insertions(+) >>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >>>> >>>> diff --git a/gcc/match.pd b/gcc/match.pd >>>> index 12d81fcac0d..ba100b117e7 100644 >>>> --- a/gcc/match.pd >>>> +++ b/gcc/match.pd >>>> @@ -8203,6 +8203,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >>>> (rdiv @0 (exps:s @1)) >>>> (mult @0 (exps (negate @1))))) >>>> >>>> + /* Simplify pow(1.0/x, y) into pow(x, -y). */ >>>> + (if (! HONOR_INFINITIES (type) >>>> + && ! HONOR_SIGNED_ZEROS (type) >>>> + && ! flag_trapping_math >>>> + && ! flag_errno_math) >>>> + (simplify >>>> + (POW (rdiv:s real_onep@0 @1) @2) >>>> + (POW @1 (negate @2))) >>> >>> This one shouldn't need HONOR_SIGNED_ZEROS? >>> >>>> + >>>> + /* Simplify pow(0.0, x) into 0.0. */ >>>> + (simplify >>>> + (POW real_zerop@0 @1) >>> >>> I think this needs !HONOR_NANS (type)? >>> >>> Otherwise OK. >> Thanks for the feedback, Richard and Andrew. I made the following changes to >> the patch (current version of the patch below): >> - also applied the pattern to POWI and added tests for pow, powif, powil >> - not gate first pattern under !HONOR_SIGNED_ZEROS, but second one >> additionally under !HONOR_NANS (type) >> - added tests for powf16 > > Note powi is GCC internal, it doesn't set errno and it should be subject > to different rules - I'd rather have patterns working on powi separate. How about moving the patterns for POWI into the section flag_unsafe_math_optimizations && canonicalize_math_p () and not use (!flag_errno_math)? > >> Now, I am encountering two problems: >> >> First, the transform is not applied for float16 (even if >> -fexcess-precision=16). Do you know what the problem could be? > > I think you want to use POW_ALL instead of POW. The generated > cfn-operators.pd shows > > (define_operator_list POW > BUILT_IN_POWF > BUILT_IN_POW > BUILT_IN_POWL > IFN_POW) > (define_operator_list POW_FN > BUILT_IN_POWF16 > BUILT_IN_POWF32 > BUILT_IN_POWF64 > BUILT_IN_POWF128 > BUILT_IN_POWF32X > BUILT_IN_POWF64X > BUILT_IN_POWF128X > null) > (define_operator_list POW_ALL > BUILT_IN_POWF > BUILT_IN_POW > BUILT_IN_POWL > BUILT_IN_POWF16 > ... > > note this comes at expense of more generated code (in > gimple/generic-match.pd). Thanks, that solved the Float16 issue. > >> Second, validation on aarch64 shows a regression in tests >> - gcc.dg/recip_sqrt_mult_1.c and >> - gcc.dg/recip_sqrt_mult_5.c, >> because the pattern (POWI(1/x, y) -> POWI(x, -y)) is applied before the >> recip pass and prevents application of the recip-patterns. The reason for >> this might be that the single-use restriction only work if the integer >> argument is non-constant, but in the failing test cases, the integer >> argument is 2 and the pattern is applied despite the :s flag. >> For example, my pattern is **not** applied (single-use restriction works) >> for: >> double res, res2; >> void foo (double a, int b) >> { >> double f (double); >> double t1 = 1.0 / a; >> res = __builtin_powi (t1, b); >> res2 = f (t1); >> } >> >> But the pattern **is** applied and single-use restriction does **not** work >> for: >> double res, res2; >> void foo (double a) >> { >> double f (double); >> double t1 = 1.0 / a; >> res = __builtin_powi (t1, 2); >> res2 = f (t1); >> } > > This must be because the result is a single operation. :s only applies > when the result has sub-expresions. This is to make CSE work. > The "fix" is to add explicit && single_use (@n) to override that > behavior. Note that I think the transform is good even when the > division is used because the result reduces the dependence chain length. > It's only when @2 is non-constant that we're introducing another > stmt for the negation that re-introduces this latency (even if in > practice it would be smaller). > >> Possible options to resolve this are: >> - gate pattern to run after recip pass >> - do not apply pattern for POWI > > - adjust the testcase (is the final outcome still good?) Without the patch, there is one fdiv instruction less (below is the assembly for recip_sqrt_mult_1.c, but for _5.c it’s analogous): No patch or with single_use of the result of the division: foo: fmov d30, 1.0e+0 fsqrt d31, d0 adrp x0, .LANCHOR0 add x1, x0, :lo12:.LANCHOR0 fdiv d30, d30, d0 fmul d0, d31, d30 str d0, [x0, #:lo12:.LANCHOR0] stp d30, d31, [x1, 8] ret
With patch:
foo:
fsqrt d31, d0
fmov d30, 1.0e+0
adrp x1, .LANCHOR0
add x0, x1, :lo12:.LANCHOR0
fdiv d31, d30, d31
fdiv d30, d30, d0
str d31, [x1, #:lo12:.LANCHOR0]
fmul d31, d31, d0
stp d30, d31, [x0, 8]
ret
So, we might want to use the single_use guard.
Best,
Jennifer
>
>> What are your thoughts on this?
>> Thanks,
>> Jennifer
>>
>> This patch adds the following two simplifications in match.pd for POW
>> and POWI:
>> - pow (1.0/x, y) to pow (x, -y), avoiding the division
>> - pow (0.0, x) to 0.0, avoiding the call to pow.
>> The patterns are guarded by flag_unsafe_math_optimizations,
>> !flag_trapping_math, !flag_errno_math, and !HONOR_INFINITIES.
>> The second pattern is also guarded by !HONOR_NANS and
>> !HONOR_SIGNED_ZEROS.
>>
>> Tests were added to confirm the application of the transform for
>> builtins pow, powf, powl, powi, powif, powil, and powf16.
>>
>> The patch was bootstrapped and regtested on aarch64-linux-gnu and
>> x86_64-linux-gnu, no regression.
>> OK for mainline?
>>
>> Signed-off-by: Jennifer Schmitz <[email protected]>
>>
>> gcc/
>> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and
>> pow (0.0, x) -> 0.0.
>>
>> gcc/testsuite/
>> * gcc.dg/tree-ssa/pow_fold_1.c: New test.
>> ---
>> gcc/match.pd | 15 ++++++++
>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 42 ++++++++++++++++++++++
>> 2 files changed, 57 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 12d81fcac0d..b061ef9dc91 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -8203,6 +8203,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> (rdiv @0 (exps:s @1))
>> (mult @0 (exps (negate @1)))))
>>
>> + (for pow (POW POWI)
>> + (if (! HONOR_INFINITIES (type)
>> + && ! flag_trapping_math
>> + && ! flag_errno_math)
>> + /* Simplify pow(1.0/x, y) into pow(x, -y). */
>> + (simplify
>> + (pow (rdiv:s real_onep@0 @1) @2)
>> + (pow @1 (negate @2)))
>> +
>> + /* Simplify pow(0.0, x) into 0.0. */
>> + (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type))
>> + (simplify
>> + (pow real_zerop@0 @1)
>> + @0))))
>> +
>> (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type)
>> && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type)
>> && ! flag_trapping_math
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>> new file mode 100644
>> index 00000000000..c38b7390478
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>> @@ -0,0 +1,42 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
>> +/* { dg-add-options float16 } */
>> +/* { dg-require-effective-target float16_runtime } */
>> +/* { dg-require-effective-target c99_runtime } */
>> +
>> +extern void link_error (void);
>> +
>> +#define POW1OVER(TYPE1, TYPE2, CTY, TY) \
>> + void \
>> + pow1over_##TY (TYPE1 x, TYPE2 y) \
>> + { \
>> + TYPE1 t1 = 1.0##CTY / x; \
>> + TYPE1 t2 = __builtin_pow##TY (t1, y); \
>> + TYPE2 t3 = -y; \
>> + TYPE1 t4 = __builtin_pow##TY (x, t3); \
>> + if (t2 != t4) \
>> + link_error (); \
>> + } \
>> +
>> +#define POW0(TYPE1, TYPE2, CTY, TY) \
>> + void \
>> + pow0_##TY (TYPE2 x) \
>> + { \
>> + TYPE1 t1 = __builtin_pow##TY (0.0##CTY, x); \
>> + if (t1 != 0.0##CTY) \
>> + link_error (); \
>> + } \
>> +
>> +#define TEST_ALL(TYPE1, TYPE2, CTY, TY) \
>> + POW1OVER (TYPE1, TYPE2, CTY, TY) \
>> + POW0 (TYPE1, TYPE2, CTY, TY)
>> +
>> +TEST_ALL (double, double, , )
>> +TEST_ALL (float, float, f, f)
>> +TEST_ALL (_Float16, _Float16, f16, f16)
>> +TEST_ALL (long double, long double, L, l)
>> +TEST_ALL (double, int, , i)
>> +TEST_ALL (float, int, f, if)
>> +TEST_ALL (long double, int, L, il)
>> +
>> +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */
>>
>
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
smime.p7s
Description: S/MIME cryptographic signature
