https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
Janne Blomqvist <jb at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #25 from Janne Blomqvist <jb at gcc dot gnu.org> --- Reducing N by a factor of 100 to make the test run faster, compiling with -Ofast -march=native I get best-of-3 runs on a) GCC trunk as of yesterday on x86_64 Ubuntu 16.04 (glibc 2.23), AMD Phenom(tm) II X4 940: - real(4) version: 6.16s - real(8) version: 1.60s b) GCC 7.2 from homebrew on macOS 10.12.6 (Sierra), 2 GHz Intel Core i7, MacBook Air mid-2012: - real(4) version: 0.768s - real(8) version: 0.512s Observations: - The "perf" profiler on Linux shows that for the real version, 99% of the time is spent in glibc libm, and for the real(8) version 97%. - glibc has improved, now the real(4) version is a factor of 4 slower than the real(8) one, rather than a factor of 10 as in the previous tests by Dominique in 2013. - If I remove the cos() call in the loop, it vectorizes with -mveclibabi=svml. So no vectorized sincos yet. Anyway, I'm not sure what the Fortran frontend could do better here. Mostly it's a glibc issue. Hence, closing as invalid.