https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104468

            Bug ID: 104468
           Summary: with -O -g, quadratic compile time of function with
                    __attribute__(("00")) that passes large structs by
                    value
           Product: gcc
           Version: 11.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: erik.carstensen at intel dot com
  Target Milestone: ---

Created attachment 52392
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52392&action=edit
Reproducer

If a function passes many large structs by value to another function, then you
get quadratic  compile performance (O(n^2)) if the file is compiled with -O -g,
but the function is annotated with __attribute__((optimize("O0"))).

Compile time seems (approximately) quadratic independently in the number of
calls, in the number of struct function arguments, and in the size of the
struct. In other words, quadratic in the total size of passed values.

It compiles instantaneously (30s -> 0.1s) if I remove the __attribute__, or -g,
or -O, or if the struct size is changed to <=16 bytes or >=81 bytes.

It's still slow if I pass
-O -fno-auto-inc-dec -fno-branch-count-reg -fno-combine-stack-adjustments
-fno-compare-elim -fno-cprop-registers -fno-dce -fno-defer-pop  -fno-dse
-fno-forward-propagate -fno-guess-branch-probability -fno-if-conversion
-fno-if-conversion2 -fno-inline-functions-called-once -fno-ipa-modref
-fno-ipa-profile -fno-ipa-pure-const -fno-ipa-reference
-fno-ipa-reference-addressable -fno-merge-constants -fno-move-loop-invariants
-fno-omit-frame-pointer -fno-reorder-blocks -fno-shrink-wrap
-fno-shrink-wrap-separate -fno-split-wide-types -fno-ssa-backprop
-fno-ssa-phiopt -fno-tree-bit-ccp -fno-tree-ccp -fno-tree-ch
-fno-tree-coalesce-vars -fno-tree-copy-prop -fno-tree-dce
-fno-tree-dominator-opts -fno-tree-dse -fno-tree-forwprop -fno-tree-fre
-fno-tree-phiprop -fno-tree-pta -fno-tree-scev-cprop -fno-tree-sink
-fno-tree-slsr -fno-tree-sra -fno-tree-ter -fno-unit-at-a-time
... which is documented to be the same as -O0.

This happens with native gcc from Fedora 34:
$ gcc --version
gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1)
$ uname -a
Linux ecarsten-mobl1.ger.corp.intel.com 5.15.12-100.fc34.x86_64 #1 SMP Wed Dec
29 15:21:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Also reproduced with gcc 6.4.

Command line:
$ gcc -g -O1 -c foo.c
or alternatively (to bypass ccache on my system):
$ /usr/libexec/gcc/x86_64-redhat-linux/11/cc1 -quiet foo.c -quiet -dumpbase
foo.c -dumpbase-ext .c -mtune=generic -march=x86-64 -g -O0 -o /tmp/ccFglVbD.s

This causes performance issues in C code generated by the DML compiler
(https://github.com/intel/device-modeling-language)

Reply via email to