https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103008
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Uroš Bizjak from comment #14) > Created attachment 52428 [details] > Proposed patch > > The attached patch implements: > > fmod (a, p) = a - trunc (a/p) * p > drem (a, p) = a - roundeven (a/p) * p > > using SSE4 round instruction (and uses fnma when available). > > Timings with Polyhedron ac.f90 on IvyBridge-E, Fedora-34, glibc 2.33-21.fc34 > > -Ofast: > 6,150082000 seconds user > > -Ofast -mno-80387: > 18,354654000 seconds user > > -Ofast -msse4: > 5,722511000 seconds user I fear this is a bit too much on the "unsafe" side. Maybe we can go this way for float but use double arithmetic for the fmod to avoid the exponent issue? For double, can we do some cheap range checking and fall back to fmod() when not safe? That said, can we have a flag like -mrecip to control this?