I did not look at execute_cse_reciprocals_1(), yet. However, with the recip-patch applied:
double recipd (double a, double b) { return a/b; } translates to recipd: frecpe d2, d1 frecps d3, d2, d1 fmul d2, d2, d3 frecps d3, d2, d1 fmul d2, d2, d3 frecps d1, d2, d1 fmul d2, d2, d1 fmul d0, d2, d0 ret float recipf (float a, float b) { return a/b; } translates to recipf: frecpe s2, s1 frecps s3, s2, s1 fmul s2, s2, s3 frecps s1, s2, s1 fmul s2, s2, s1 fmul s0, s2, s0 ret So it seems, that it works also for a generic division. Best regards, Benedikt > On 24 Jun 2015, at 22:39, Evandro Menezes <e.mene...@samsung.com> wrote: > > Philipp, > > I think that execute_cse_reciprocals_1() applies only when the denominator is > known at compile-time, otherwise the division stays. It doesn't seem to know > whether the target supports the approximate reciprocal or not. > > Cheers, > > -- > Evandro Menezes Austin, TX > > >> -----Original Message----- >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On >> Behalf Of Dr. Philipp Tomsich >> Sent: Wednesday, June 24, 2015 15:08 >> To: Evandro Menezes >> Cc: Benedikt Huber; gcc-patches@gcc.gnu.org >> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) >> estimation in -ffast-math >> >> Evandro, >> >> Shouldn't ‘execute_cse_reciprocals_1’ take care of this, once the reciprocal- >> division is implemented? >> Do you think there’s additional work needed to catch all cases/opportunities? >> >> Best, >> Philipp. >> >>> On 24 Jun 2015, at 20:19, Evandro Menezes <e.mene...@samsung.com> wrote: >>> >>> Benedikt, >>> >>> Are you developing the reciprocal approximation just for 1/x proper or for >> any division, as in x/y = x * 1/y? >>> >>> Thank you, >>> >>> -- >>> Evandro Menezes Austin, TX >>> >>> >>>> -----Original Message----- >>>> From: Benedikt Huber [mailto:benedikt.hu...@theobroma-systems.com] >>>> Sent: Wednesday, June 24, 2015 12:11 >>>> To: Dr. Philipp Tomsich >>>> Cc: Evandro Menezes; gcc-patches@gcc.gnu.org >>>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root >>>> (rsqrt) estimation in -ffast-math >>>> >>>> Evandro, >>>> >>>> Yes, we also have the 1/x approximation. >>>> However we do not have the test cases yet, and it also would need >>>> some clean up. >>>> I am going to provide a patch for that soon (say next week). >>>> Also, for this optimization we have *not* yet found a benchmark with >>>> significant improvements. >>>> >>>> Best Regards, >>>> Benedikt >>>> >>>> >>>>> On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich >>>>> <philipp.tomsich@theobroma- >>>> systems.com> wrote: >>>>> >>>>> Evandro, >>>>> >>>>> We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) >>>>> reciprocal >>>> sqrt. >>>>> >>>>> Also, the “reciprocal divide” patches are floating around in various >>>>> of our git-tree, but aren’t ready for public consumption, yet… I’ll >>>>> leave Benedikt to comment on potential timelines for getting that >>>>> pushed >>>> out. >>>>> >>>>> Best, >>>>> Philipp. >>>>> >>>>>> On 24 Jun 2015, at 18:42, Evandro Menezes <e.mene...@samsung.com> wrote: >>>>>> >>>>>> Benedikt, >>>>>> >>>>>> You beat me to it! :-) Do you have the implementation for dividing >>>>>> using the Newton series as well? >>>>>> >>>>>> I'm not sure that the series is always for all data types and on >>>>>> all processors. It would be useful to allow each AArch64 processor >>>>>> to enable this or not depending on the data type. BTW, do you have >>>>>> some tests showing the speed up? >>>>>> >>>>>> Thank you, >>>>>> >>>>>> -- >>>>>> Evandro Menezes Austin, TX >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: gcc-patches-ow...@gcc.gnu.org >>>>>>> [mailto:gcc-patches-ow...@gcc.gnu.org] >>>>>> On >>>>>>> Behalf Of Benedikt Huber >>>>>>> Sent: Thursday, June 18, 2015 7:04 >>>>>>> To: gcc-patches@gcc.gnu.org >>>>>>> Cc: benedikt.hu...@theobroma-systems.com; >>>>>>> philipp.tomsich@theobroma- systems.com >>>>>>> Subject: [PATCH] [aarch64] Implemented reciprocal square root >>>>>>> (rsqrt) estimation in -ffast-math >>>>>>> >>>>>>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt >>>>>>> estimation >>>>>> and >>>>>>> a Newton-Raphson step, respectively. >>>>>>> There are ARMv8 implementations where this is faster than using >>>>>>> fdiv and rsqrt. >>>>>>> It runs three steps for double and two steps for float to achieve >>>>>>> the >>>>>> needed >>>>>>> precision. >>>>>>> >>>>>>> There is one caveat and open question. >>>>>>> Since -ffast-math enables flush to zero intermediate values >>>>>>> between approximation steps will be flushed to zero if they are >> denormal. >>>>>>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX). >>>>>>> The test cases pass, but it is unclear to me whether this is >>>>>>> expected behavior with -ffast-math. >>>>>>> >>>>>>> The patch applies to commit: >>>>>>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470 >>>>>>> >>>>>>> Please consider including this patch. >>>>>>> Thank you and best regards, >>>>>>> Benedikt Huber >>>>>>> >>>>>>> Benedikt Huber (1): >>>>>>> 2015-06-15 Benedikt Huber <benedikt.hu...@theobroma-systems.com> >>>>>>> >>>>>>> gcc/ChangeLog | 9 +++ >>>>>>> gcc/config/aarch64/aarch64-builtins.c | 60 ++++++++++++++++ >>>>>>> gcc/config/aarch64/aarch64-protos.h | 2 + >>>>>>> gcc/config/aarch64/aarch64-simd.md | 27 ++++++++ >>>>>>> gcc/config/aarch64/aarch64.c | 63 +++++++++++++++++ >>>>>>> gcc/config/aarch64/aarch64.md | 3 + >>>>>>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113 >>>>>>> +++++++++++++++++++++++++++++++ >>>>>>> 7 files changed, 277 insertions(+) create mode 100644 >>>>>>> gcc/testsuite/gcc.target/aarch64/rsqrt.c >>>>>>> >>>>>>> -- >>>>>>> 1.9.1 >>>>>> <Mail Attachment.eml> >>>>> >>> >>> >
signature.asc
Description: Message signed with OpenPGP using GPGMail