https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104485
Bug ID: 104485 Summary: x378 fmod inline code is slow Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- In 526.blender_r one can see us expanding fmod as fld1 fldl (%rsi) .L2: fprem fnstsw %ax testb $4, %ah jne .L2 fstp %st(1) ... which is quite a bit slower than just calling into libm. The case in question is actually special and can be approximated by void foo (double * __restrict s, double *d) { s[0] = fmod(d[0], 1.0f); s[1] = fmod(d[1], 1.0f); } where obtaining the fractional part of {d[0], d[1]} might even be vectorizable. Building 526.blender_r with -fno-builtin-fmod (-mno-fancy-math-387 doesn't do the trick here) speeds it up by 1% on Zen2.