https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103008

--- Comment #14 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 52428
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52428&action=edit
Proposed patch

The attached patch implements:

fmod (a, p) = a - trunc (a/p) * p
drem (a, p) = a - roundeven (a/p) * p

using SSE4 round instruction (and uses fnma when available).

Timings with Polyhedron ac.f90 on IvyBridge-E, Fedora-34, glibc 2.33-21.fc34

-Ofast:
       6,150082000 seconds user

-Ofast -mno-80387:
      18,354654000 seconds user

-Ofast -msse4:
       5,722511000 seconds user

Reply via email to