https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106415
Bug ID: 106415 Summary: loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: undefinedopcode2 at gmail dot com Target Milestone: --- Created attachment 53338 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53338&action=edit C file that reproduces the problem. When targeting m68k and compiling certain loops with 16-bit counters that should trivially generate a DBRA instruction, GCC's optimization passes end up converting the IV to 32-bit, which requires extra logic to check the upper half. More specifically, these are loops where the number of iterations is known at compile time. This additional code is completely useless since we know the loop count fits in 16 bits. I am using GCC 11.2.0 hosted on ARM64 macOS and targeting m68k. All code snippets were compiled with `-O3 -std=c99 -march=68000 -mtune=68000`. Consider the following function: void dbra_test1(short i) { do { foo(i); } while(--i != -1); } As expected, the generated body is a tiny loop consisting solely of call setup, the call itself, call cleanup, and a DBRA: .L2: movew %d2,%a0 movel %a0,%sp@- jsr %a2@ addql #4,%sp dbra %d2,.L2 Now consider this function, where we change the initial value of the loop count to be a constant: void dbra_test2(void) { short i = 15; do { foo(i); } while(--i != -1); } GCC generates the following code for the body of the loop: .L7: movel %d2,%sp@- jsr %a2@ addql #4,%sp dbra %d2,.L7 clrw %d2 subql #1,%d2 jcc .L7 Note the extraneous clr/subq/jcc. During ivcanon, GCC transforms the second loop to run from 16 to 0 instead of 15 to -1. Later during ivopts, it transforms back into 15 to -1 form, but promotes the variable from short to int. Future transformations are no longer able to optimize around the short variable, and we end up with extraneous checks inserted during codegen. I've attached a simple file that reproduces the problem. GCC 2.95.3 performed the operation correctly, but it's been broken since at least 4.3.2, possibly earlier. Thanks --UD2