https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114
Bug ID: 84114 Summary: global reassociation pass prevents fma usage, generates slower code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: sje at gcc dot gnu.org Target Milestone: --- Created attachment 43279 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43279&action=edit Test case The example code comes from milc in SPEC2006. GCC on x86 or aarch64 generates better code with -O3 than it does with -Ofast or '-O3 -ffast-math'. On x86 compiling with '-mfma -O3' I get 5 vfmadd231sd instructions, 1 vmulsd instruction and 6 vmovsd. With '-mfma -Ofast' I get 3 vfmadd231sd, 2 vaddsd, 3 vmulsd, and 6 vmovsd. That is two extra instructions. The problem seems to be that -Ofast turns on -ffast-math and that enables the global reassociation pass (tree-ssa-reassoc.c) and the code changes done there create some temporary variables which inhibit the recognition and use of fma instructions. Using -O3 and -Ofast on aarch64 shows the same change.