On Thu, Jun 25, 2015 at 1:24 AM, Ramana Radhakrishnan
<ramana.radhakrish...@foss.arm.com> wrote:
> Benedikt,
>
> On 25/06/15 08:01, pins...@gmail.com wrote:
>>
>>
>>
>>
>>
>>> On Jun 18, 2015, at 5:04 AM, Benedikt Huber
>>> <benedikt.hu...@theobroma-systems.com> wrote:
>>>
>>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation
>>> and
>>> a Newton-Raphson step, respectively.
>>> There are ARMv8 implementations where this is faster than using fdiv and
>>> rsqrt.
>>> It runs three steps for double and two steps for float to achieve the
>>> needed precision.
>>
>>
>> This is NOT a win on thunderX at least for single precision because you
>> have to do the divide and sqrt in the same time as it takes 5 multiples
>> (estimate and step are multiplies in the thunderX pipeline).  Doubles is 10
>> multiplies which is just the same as what the patch does (but it is really
>> slightly less than 10, I rounded up). So in the end this is NOT a win at all
>> for thunderX unless we do one less step for both single and double.
>>
>
>
> Have you seen this https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00164.html
> ? Really this is something that should be gated by the costs infrastructure


Yes I saw that in fact I did not look into the latencies of our core
until this patch came out.  But yes this should be gated by a cost
infrastructure and most likely not as part of the -mcpu=generic cost
(well the rsqrt if we change it to 1 iterations and 2 iterations).

Thanks,
Andrew

> .
>
>
> regards
> Ramana
>
>
>
>
>
>
>> Thanks,
>> Andrew
>>
>>
>>>
>>> There is one caveat and open question.
>>> Since -ffast-math enables flush to zero intermediate values between
>>> approximation steps
>>> will be flushed to zero if they are denormal.
>>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
>>> The test cases pass, but it is unclear to me whether this is expected
>>> behavior with -ffast-math.
>>>
>>> The patch applies to commit:
>>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
>>>
>>> Please consider including this patch.
>>> Thank you and best regards,
>>> Benedikt Huber
>>>
>>> Benedikt Huber (1):
>>>   2015-06-15  Benedikt Huber  <benedikt.hu...@theobroma-systems.com>
>>>
>>> gcc/ChangeLog                            |   9 +++
>>> gcc/config/aarch64/aarch64-builtins.c    |  60 ++++++++++++++++
>>> gcc/config/aarch64/aarch64-protos.h      |   2 +
>>> gcc/config/aarch64/aarch64-simd.md       |  27 ++++++++
>>> gcc/config/aarch64/aarch64.c             |  63 +++++++++++++++++
>>> gcc/config/aarch64/aarch64.md            |   3 +
>>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
>>> +++++++++++++++++++++++++++++++
>>> 7 files changed, 277 insertions(+)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
>>>
>>> --
>>> 1.9.1
>>>
>

Reply via email to