https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82329
Bug ID: 82329 Summary: #pragma GCC target/optimize incurs high compilation time cost Product: gcc Version: 7.0 Status: UNCONFIRMED Keywords: compile-time-hog Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Translation units that include "umbrella" x86 intrinsic files, i.e. x86intrin.h or immintrin.h are noticeably slow to compile: $ time echo '#include <x86intrin.h>' | gcc -xc - -S -o /dev/null -Os real 0m0.162s user 0m0.150s sys 0m0.010s This is because directives like '#pragma GCC target("sse3")' in included files cause ~8600 intrinsic declarations to parse very slowly. The pragma causes a 'target' attribute to be added to each declaration in the beginning of attribs.c:decl_attributes, and then the loop over attributes goes into lookup_scoped_attribute_spec and later on into handle_target_attribute and ix86_valid_target_attribute_p, all of which seem fairly inefficient. It probably would have been better to appropriately memoize and reuse tree nodes instead of looking up the same two items in the hash over and over again. On a testcase below isolating just this issue, perf shows 10.52% cc1 cc1 [.] cl_option_hasher::hash 9.52% cc1 cc1 [.] cl_optimization_save 5.17% cc1 libc-2.24.so [.] __strcmp_sse2_unaligned 3.80% cc1 cc1 [.] iterative_hash_host_wide_int 3.43% cc1 libc-2.24.so [.] _int_malloc 2.47% cc1 libc-2.24.so [.] _int_free 2.31% cc1 libc-2.24.so [.] malloc 2.28% cc1 libc-2.24.so [.] malloc_consolidate 2.09% cc1 cc1 [.] ggc_internal_alloc 1.91% cc1 cc1 [.] ix86_valid_target_attribute_tree #define x10(x, a) \ x(a##0) x(a##1) x(a##2) x(a##3) x(a##4) x(a##5) x(a##6) x(a##7) x(a##8) x(a##9) #define x100(x, a) \ x10(x, a##0) x10(x, a##1) x10(x, a##2) x10(x, a##3) x10(x, a##4) \ x10(x, a##5) x10(x, a##6) x10(x, a##7) x10(x, a##8) x10(x, a##9) #define x1000(x, a) \ x100(x, a##0) x100(x, a##1) x100(x, a##2) x100(x, a##3) x100(x, a##4) \ x100(x, a##5) x100(x, a##6) x100(x, a##7) x100(x, a##8) x100(x, a##9) #define x10000(x, a) \ x1000(x, a##0) x1000(x, a##1) x1000(x, a##2) x1000(x, a##3) x1000(x, a##4) \ x1000(x, a##5) x1000(x, a##6) x1000(x, a##7) x1000(x, a##8) x1000(x, a##9) #define x(a) void a(void); #pragma GCC target("sse3") x10000(x, a)