https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88770
Bug ID: 88770 Summary: Redundant load opt. or CSE pessimizes code Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: bisqwit at iki dot fi Target Milestone: --- For this code (-xc -std=c99 or -xc++ -std=c++17): struct guu { int a; int b; float c; char d; }; extern void test(struct guu); void caller() { test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} ); test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} ); } CSE (or some other form of redundant loads optimization) pessimizes the code. Problem occurs on optimization levels -O1 and higher, including -Os. If the function "caller" calls test() just once, the resulting code is (-O3 -fno-optimize-sibling-calls, stack alignment/push/pops omitted for brevity): movabs rdi, 21474836483 movabs rsi, 39743127552 call test If "caller" calls test() twice, the code is a lot longer and not just twice as long. (Stack alignment/push/pops omitted for brevity): movabs rbp, 21474836483 mov rdi, rbp movabs rbx, 38654705664 mov rsi, rbx or rbx, 1088421888 or rsi, 1088421888 call test mov rsi, rbx mov rdi, rbp call test If we change caller() such that the parameters in the two calls are not identical: void caller() { test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} ); test( (struct guu){.a = 3, .b = 6, .c = 7, .d = 10} ); } The generated code is optimal again as expected: movabs rdi, 21474836483 movabs rsi, 39743127552 call test movabs rdi, 25769803779 movabs rsi, 44038094848 call test The problem in the first examples is that the compiler sees that the same parameter is used twice, and it tries to save it in a callee-saves register, in order to reuse the same values on the second call. However re-initializing the registers from scratch would have been more efficient. The problem occurs on GCC versions 4.8.1 and newer. It does not occur in GCC version 4.7.4, which generated different code that is otherwise inefficient. For reference, the problem also exists in Clang versions 3.5 and newer, but not in versions 3.4 and earlier.