https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87601
Bug ID: 87601 Summary: Missed opportunity for flag reuse and macro-op fusion on x86 Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vgatherps at gmail dot com Target Milestone: --- When I compile the following code with gcc 8.2 and options -O2 (or Os) and -mtune=intel (or broadwell): int sum(int *vals, int l) { int a = 0; if (l <= 0) { return 0; } for (int i = l; i != 0; i--) { a += vals[i-1]; } return a; } The following code is generated: sum(int*, int): xor eax, eax test esi, esi jle .L1 movsx rsi, esi .L3: add eax, DWORD PTR [rdi-4+rsi*4] sub rsi, 1 test esi, esi jne .L3 .L1: ret When passing -march=broadwell or -Os, sub is replaced by dec but otherwise it's the same. Inside the loop, the sequence: sub rsi, 1 test esi, esi jne .L3 can be replaced with: sub rsi, 1 jne .L3 since sub rsi, 1 since that would set the same zero flag that test would. This would improve macro-op fusion on relatively recent architectures as well. Anecdotally, I've seen similar decisions being made along the lines of sub index, 1 // some more asm here not using index test index, index jne loop_start But don't have a nice clean test case for it. This suggests to me that the optimization around flag reuse and macro-op fusion could be improved in general, and I'll work on getting some clean test cases for other cases.