https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104485

            Bug ID: 104485
           Summary: x378 fmod inline code is slow
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

In 526.blender_r one can see us expanding fmod as

        fld1
        fldl    (%rsi)
.L2:
        fprem
        fnstsw  %ax
        testb   $4, %ah
        jne     .L2
        fstp    %st(1)
...

which is quite a bit slower than just calling into libm.  The case in
question is actually special and can be approximated by

void foo (double * __restrict s, double *d)
{
  s[0] = fmod(d[0], 1.0f);
  s[1] = fmod(d[1], 1.0f);
}

where obtaining the fractional part of {d[0], d[1]} might even be vectorizable.

Building 526.blender_r with -fno-builtin-fmod (-mno-fancy-math-387 doesn't do
the trick here) speeds it up by 1% on Zen2.

Reply via email to