On 19-01-2016 03:49, Siddhesh Poyarekar wrote:
> On 19 January 2016 at 00:06, Adhemerval Zanella
> <adhemerval.zane...@linaro.org> wrote:
>> No one has posted any patch or stirred discussions about it.  The complex
>> function in libm are usually coded in in C to be platform neutral, with
>> some specific function being optimized (rounding, etc.). x86_64 also have
>> some assembly implementations for some specific routines (exp, log, ...),
>> but I also do not have number about how fast are they related to C
>> counterparts (it also might be the case where the speedup is not that
>> high to validate the assembly existence).
> 
> A correction here: i686 has a lot of assembly math implementations,
> x86_64 doesn't.  The last x86_64 asm implementation was sincos which
> was removed because it was not accurate enough for our project goals.
> The i686 asm versions (and for other archs, I think alpha and m68k)
> are there because nobody cares enough about their precision.  The i686
> functions for example are known to not be precise for the entire input
> domain.

I do see some x86_64 specialized implementation being used currently
(sysdeps/x86_64/fpu/s_{sin,cos}f.S for instance). The sincos implementations
is still used (sysdeps/x86_64/fpu/s_sincosf.S).

What you referring that glibc has dropped is the utilization of the
fsin/fcos/fsincos Intel instructions, which shows a ridiculous error
range depending of the inputs [1].

[1] 
https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/
 

> 
>> Rule of thumb currently in GLIBC is to avoid as possible arch-assembly
>> routines and work with C implementation that are platform neutral with
>> possible arch hooks on sensitive performance paths (check Siddhesh
>> recent sincos performance improvements).
> 
> The general rule here is to more or less guarantee that the algorithm
> does not lose precision regardless of the language it is written in.
> However if you want the community also to support it actively, writing
> it in C is your best bet.
> 
>> For very critical performance paths we also have the option to add
>> specific build with more aggressive optimization flags along with
>> IFUNC support (for instance one for A57 and another for A72, if
>> it is such the case).
> 
> This is the cheapest way to squeeze out some performance, provided
> that the compiler is tuned correctly.  This is in fact what we do in
> x86_64 with ifunc implementations for avx, sse2 and fma4.
> 
> Siddhesh
> 
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to