[Bug target/103008] poor inlined builtin_fmod on x86_64

rguenth at gcc dot gnu.org via Gcc-bugs Sun, 13 Feb 2022 23:35:43 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103008


--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #14)
> Created attachment 52428 [details]
> Proposed patch
> 
> The attached patch implements:
> 
> fmod (a, p) = a - trunc (a/p) * p
> drem (a, p) = a - roundeven (a/p) * p
> 
> using SSE4 round instruction (and uses fnma when available).
> 
> Timings with Polyhedron ac.f90 on IvyBridge-E, Fedora-34, glibc 2.33-21.fc34
> 
> -Ofast:
>        6,150082000 seconds user
> 
> -Ofast -mno-80387:
>       18,354654000 seconds user
> 
> -Ofast -msse4:
>        5,722511000 seconds user

I fear this is a bit too much on the "unsafe" side.  Maybe we can
go this way for float but use double arithmetic for the fmod to avoid
the exponent issue?  For double, can we do some cheap range checking
and fall back to fmod() when not safe?

That said, can we have a flag like -mrecip to control this?

[Bug target/103008] poor inlined builtin_fmod on x86_64

Reply via email to