https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114
Wilco <wdijkstr at arm dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wdijkstr at arm dot com
--- Comment #2 from Wilco <wdijkstr at arm dot com> ---
(In reply to Steve Ellcey from comment #0)
> Created attachment 43279 [details]
> Test case
>
> The example code comes from milc in SPEC2006.
>
> GCC on x86 or aarch64 generates better code with -O3 than it does with
> -Ofast or '-O3 -ffast-math'. On x86 compiling with '-mfma -O3' I get 5
> vfmadd231sd instructions, 1 vmulsd instruction and 6 vmovsd. With '-mfma
> -Ofast' I get 3 vfmadd231sd, 2 vaddsd, 3 vmulsd, and 6 vmovsd. That is two
> extra instructions.
>
> The problem seems to be that -Ofast turns on -ffast-math and that enables
> the global reassociation pass (tree-ssa-reassoc.c) and the code changes
> done there create some temporary variables which inhibit the recognition
> and use of fma instructions.
>
> Using -O3 and -Ofast on aarch64 shows the same change.
I noticed this a while back, the reassociation pass has changed and now we get
far fewer fmas.
See https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00771.html