On Thu, Jun 25, 2015 at 1:24 AM, Ramana Radhakrishnan <ramana.radhakrish...@foss.arm.com> wrote: > Benedikt, > > On 25/06/15 08:01, pins...@gmail.com wrote: >> >> >> >> >> >>> On Jun 18, 2015, at 5:04 AM, Benedikt Huber >>> <benedikt.hu...@theobroma-systems.com> wrote: >>> >>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation >>> and >>> a Newton-Raphson step, respectively. >>> There are ARMv8 implementations where this is faster than using fdiv and >>> rsqrt. >>> It runs three steps for double and two steps for float to achieve the >>> needed precision. >> >> >> This is NOT a win on thunderX at least for single precision because you >> have to do the divide and sqrt in the same time as it takes 5 multiples >> (estimate and step are multiplies in the thunderX pipeline). Doubles is 10 >> multiplies which is just the same as what the patch does (but it is really >> slightly less than 10, I rounded up). So in the end this is NOT a win at all >> for thunderX unless we do one less step for both single and double. >> > > > Have you seen this https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00164.html > ? Really this is something that should be gated by the costs infrastructure
Yes I saw that in fact I did not look into the latencies of our core until this patch came out. But yes this should be gated by a cost infrastructure and most likely not as part of the -mcpu=generic cost (well the rsqrt if we change it to 1 iterations and 2 iterations). Thanks, Andrew > . > > > regards > Ramana > > > > > > >> Thanks, >> Andrew >> >> >>> >>> There is one caveat and open question. >>> Since -ffast-math enables flush to zero intermediate values between >>> approximation steps >>> will be flushed to zero if they are denormal. >>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX). >>> The test cases pass, but it is unclear to me whether this is expected >>> behavior with -ffast-math. >>> >>> The patch applies to commit: >>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470 >>> >>> Please consider including this patch. >>> Thank you and best regards, >>> Benedikt Huber >>> >>> Benedikt Huber (1): >>> 2015-06-15 Benedikt Huber <benedikt.hu...@theobroma-systems.com> >>> >>> gcc/ChangeLog | 9 +++ >>> gcc/config/aarch64/aarch64-builtins.c | 60 ++++++++++++++++ >>> gcc/config/aarch64/aarch64-protos.h | 2 + >>> gcc/config/aarch64/aarch64-simd.md | 27 ++++++++ >>> gcc/config/aarch64/aarch64.c | 63 +++++++++++++++++ >>> gcc/config/aarch64/aarch64.md | 3 + >>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113 >>> +++++++++++++++++++++++++++++++ >>> 7 files changed, 277 insertions(+) >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c >>> >>> -- >>> 1.9.1 >>> >