https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114
Bug ID: 84114
Summary: global reassociation pass prevents fma usage,
generates slower code
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: sje at gcc dot gnu.org
Target Milestone: ---
Created attachment 43279
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43279&action=edit
Test case
The example code comes from milc in SPEC2006.
GCC on x86 or aarch64 generates better code with -O3 than it does with -Ofast
or '-O3 -ffast-math'. On x86 compiling with '-mfma -O3' I get 5 vfmadd231sd
instructions, 1 vmulsd instruction and 6 vmovsd. With '-mfma -Ofast' I get 3
vfmadd231sd, 2 vaddsd, 3 vmulsd, and 6 vmovsd. That is two extra instructions.
The problem seems to be that -Ofast turns on -ffast-math and that enables
the global reassociation pass (tree-ssa-reassoc.c) and the code changes
done there create some temporary variables which inhibit the recognition
and use of fma instructions.
Using -O3 and -Ofast on aarch64 shows the same change.