https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766

Janne Blomqvist <jb at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |INVALID

--- Comment #25 from Janne Blomqvist <jb at gcc dot gnu.org> ---
Reducing N by a factor of 100 to make the test run faster, compiling with
-Ofast -march=native I get best-of-3 runs on

a) GCC trunk as of yesterday on x86_64 Ubuntu 16.04 (glibc 2.23), AMD
Phenom(tm) II X4 940:

  - real(4) version: 6.16s
  - real(8) version: 1.60s

b) GCC 7.2 from homebrew on macOS 10.12.6 (Sierra), 2 GHz Intel Core i7,
MacBook Air mid-2012:

  - real(4) version: 0.768s
  - real(8) version: 0.512s


Observations: 

- The "perf" profiler on Linux shows that for the real version, 99% of the time
is spent in glibc libm, and for the real(8) version 97%. 

- glibc has improved, now the real(4) version is a factor of 4 slower than the
real(8) one, rather than a factor of 10 as in the previous tests by Dominique
in 2013.

- If I remove the cos() call in the loop, it vectorizes with -mveclibabi=svml.
So no vectorized sincos yet.

Anyway, I'm not sure what the Fortran frontend could do better here. Mostly
it's a glibc issue. Hence, closing as invalid.

Reply via email to