https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97142
Bug ID: 97142 Summary: __builtin_fmod not optimized on POWER Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: fx at gnu dot org Target Milestone: --- I ran some Fortran benchmarks (the "Polyhedron" set) on POWER9, and found one of them has pathologically bad performance compared with xlf. Profiling shows that's due to spending most of its time in fmod via a random-number function. fmod isn't called when compiled with xlf -O5 or when compiling the same on x86_64. Although it's Fortran, this doesn't appear to be Fortran-specific as the DMOD intrinsic is turned into __builtin_fmod. The following is with gcc 10.2, comparing the two targets. On RHEL7 POWER9 (and the same with -mcpu=native): $ cat ggl.f90 REAL FUNCTION GGL(Ds) DOUBLE PRECISION Ds , d2 DATA d2/2147483647.D0/ Ds = DMOD(16807.D0*Ds,d2) GGL = Ds/d2 END $ gfortran -O3 -fopt-info-all -c ggl.f90 ggl.f90:4:0: missed: not inlinable: ggl/0 -> __builtin_fmod/2, function body not available Unit growth for small function inlining: 12->12 (0%) Inlined 0 calls, eliminated 0 functions ggl.f90:6:0: note: ***** Analysis failed with vector mode V2DF $ nm ggl.o U fmod 0000000000000000 T ggl_ U .TOC. On Debian 10 SKX with the same source: $ gfortran-10 -Ofast -fopt-info-all -c ggl.f90 ggl.f90:4:0: missed: not inlinable: ggl/0 -> __builtin_fmod/2, function body not available Unit growth for small function inlining: 12->12 (0%) Inlined 0 calls, eliminated 0 functions ggl.f90:6:0: note: ***** Analysis failed with vector mode V2DF ggl.f90:6:0: note: ***** Skipping vector mode V16QI, which would repeat the analysis for V2DF $ nm ggl.o 0000000000000000 r .LC0 0000000000000008 r .LC1 0000000000000010 r .LC2 0000000000000000 T ggl_