在 2025-02-05 21:33, Martin Storsjö 写道:
On UCRT, we pass calls to sin()/cos() to the CRT; the UCRT
versions of these functions are faster than our x87 implementations.

The same also goes for sincos(); calling the UCRT sin() and cos()
from UCRT separately is almost 3x as fast as calling the x87
sincos() implementation.

Using assembly for implementing these functions; a plain C
version can be optimized by the compiler back into a plain call
to sincos() (GCC does by default, Clang does it if compiling with
-ffast-math), see 3f40dd3254582722761606c7c99d658f952002d9 for
earlier precedent for arm/aarch64.
---
Updated the code to exactly match the output from Clang on x86_64
(the previous version had manual touch-ups to some SSE moves).

Now the x86_64 routines is exactly the output of Clang on x86_64,
and exactly the output of GCC on i386. (Clang defaults to using
SSE2 on i386 targets.)

Ideally, we wouldn't need to use assembly for things like this;
ideally we should set -fno-builtin for the relevant source files.
However, with automake, it is not easy to set such an option
specifically for one individual file, and it is not possible to
activate specifically -fno-builtin with either a pragma or an
optimize attribute in the source files.
---
  mingw-w64-crt/Makefile.am        |  9 ++++--
  mingw-w64-crt/math/x86/sincos.S  | 47 ++++++++++++++++++++++++++++++++
  mingw-w64-crt/math/x86/sincosf.S | 46 +++++++++++++++++++++++++++++++
  3 files changed, 100 insertions(+), 2 deletions(-)
  create mode 100644 mingw-w64-crt/math/x86/sincos.S
  create mode 100644 mingw-w64-crt/math/x86/sincosf.S

This patch looks good to me, too.

--
Best regards,
LIU Hao

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to