在 2025-02-05 21:33, Martin Storsjö 写道:
On UCRT, we pass calls to sin()/cos() to the CRT; the UCRT versions of these functions are faster than our x87 implementations.The same also goes for sincos(); calling the UCRT sin() and cos() from UCRT separately is almost 3x as fast as calling the x87 sincos() implementation. Using assembly for implementing these functions; a plain C version can be optimized by the compiler back into a plain call to sincos() (GCC does by default, Clang does it if compiling with -ffast-math), see 3f40dd3254582722761606c7c99d658f952002d9 for earlier precedent for arm/aarch64. --- Updated the code to exactly match the output from Clang on x86_64 (the previous version had manual touch-ups to some SSE moves). Now the x86_64 routines is exactly the output of Clang on x86_64, and exactly the output of GCC on i386. (Clang defaults to using SSE2 on i386 targets.) Ideally, we wouldn't need to use assembly for things like this; ideally we should set -fno-builtin for the relevant source files. However, with automake, it is not easy to set such an option specifically for one individual file, and it is not possible to activate specifically -fno-builtin with either a pragma or an optimize attribute in the source files. --- mingw-w64-crt/Makefile.am | 9 ++++-- mingw-w64-crt/math/x86/sincos.S | 47 ++++++++++++++++++++++++++++++++ mingw-w64-crt/math/x86/sincosf.S | 46 +++++++++++++++++++++++++++++++ 3 files changed, 100 insertions(+), 2 deletions(-) create mode 100644 mingw-w64-crt/math/x86/sincos.S create mode 100644 mingw-w64-crt/math/x86/sincosf.S
This patch looks good to me, too. -- Best regards, LIU Hao
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public