On 19-01-2016 03:49, Siddhesh Poyarekar wrote: > On 19 January 2016 at 00:06, Adhemerval Zanella > <adhemerval.zane...@linaro.org> wrote: >> No one has posted any patch or stirred discussions about it. The complex >> function in libm are usually coded in in C to be platform neutral, with >> some specific function being optimized (rounding, etc.). x86_64 also have >> some assembly implementations for some specific routines (exp, log, ...), >> but I also do not have number about how fast are they related to C >> counterparts (it also might be the case where the speedup is not that >> high to validate the assembly existence). > > A correction here: i686 has a lot of assembly math implementations, > x86_64 doesn't. The last x86_64 asm implementation was sincos which > was removed because it was not accurate enough for our project goals. > The i686 asm versions (and for other archs, I think alpha and m68k) > are there because nobody cares enough about their precision. The i686 > functions for example are known to not be precise for the entire input > domain.
I do see some x86_64 specialized implementation being used currently (sysdeps/x86_64/fpu/s_{sin,cos}f.S for instance). The sincos implementations is still used (sysdeps/x86_64/fpu/s_sincosf.S). What you referring that glibc has dropped is the utilization of the fsin/fcos/fsincos Intel instructions, which shows a ridiculous error range depending of the inputs [1]. [1] https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/ > >> Rule of thumb currently in GLIBC is to avoid as possible arch-assembly >> routines and work with C implementation that are platform neutral with >> possible arch hooks on sensitive performance paths (check Siddhesh >> recent sincos performance improvements). > > The general rule here is to more or less guarantee that the algorithm > does not lose precision regardless of the language it is written in. > However if you want the community also to support it actively, writing > it in C is your best bet. > >> For very critical performance paths we also have the option to add >> specific build with more aggressive optimization flags along with >> IFUNC support (for instance one for A57 and another for A72, if >> it is such the case). > > This is the cheapest way to squeeze out some performance, provided > that the compiler is tuned correctly. This is in fact what we do in > x86_64 with ifunc implementations for avx, sse2 and fma4. > > Siddhesh > _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain