https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114

            Bug ID: 84114
           Summary: global reassociation pass prevents fma usage,
                    generates slower code
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sje at gcc dot gnu.org
  Target Milestone: ---

Created attachment 43279
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43279&action=edit
Test case

The example code comes from milc in SPEC2006.

GCC on x86 or aarch64 generates better code with -O3 than it does with -Ofast
or '-O3 -ffast-math'.  On x86 compiling with '-mfma -O3' I get 5 vfmadd231sd
instructions, 1 vmulsd instruction and 6 vmovsd.  With '-mfma -Ofast' I get 3
vfmadd231sd, 2 vaddsd, 3 vmulsd, and 6 vmovsd.  That is two extra instructions.

The problem seems to be that -Ofast turns on -ffast-math and that enables
the global reassociation pass (tree-ssa-reassoc.c) and the code changes
done there create some temporary variables which inhibit the recognition
and use of fma instructions.

Using -O3 and -Ofast on aarch64 shows the same change.

Reply via email to