https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103008
--- Comment #14 from Uroš Bizjak <ubizjak at gmail dot com> --- Created attachment 52428 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52428&action=edit Proposed patch The attached patch implements: fmod (a, p) = a - trunc (a/p) * p drem (a, p) = a - roundeven (a/p) * p using SSE4 round instruction (and uses fnma when available). Timings with Polyhedron ac.f90 on IvyBridge-E, Fedora-34, glibc 2.33-21.fc34 -Ofast: 6,150082000 seconds user -Ofast -mno-80387: 18,354654000 seconds user -Ofast -msse4: 5,722511000 seconds user