https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106415

            Bug ID: 106415
           Summary: loop-ivopts prevents correct usage of dbra with 16-bit
                    loop counters on m68k
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: undefinedopcode2 at gmail dot com
  Target Milestone: ---

Created attachment 53338
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53338&action=edit
C file that reproduces the problem.

When targeting m68k and compiling certain loops with 16-bit counters that
should trivially generate a DBRA instruction, GCC's optimization passes end up
converting the IV to 32-bit, which  requires extra logic to check the upper
half. More specifically, these are loops where the number of iterations is
known at compile time.

This additional code is completely useless since we know the loop count fits in
16 bits.

I am using GCC 11.2.0 hosted on ARM64 macOS and targeting m68k. All code
snippets were compiled with `-O3 -std=c99 -march=68000 -mtune=68000`.

Consider the following function:
void dbra_test1(short i) {
    do {
        foo(i);
    } while(--i != -1);
}

As expected, the generated body is a tiny loop consisting solely of call setup,
the call itself, call cleanup, and a DBRA:
.L2:
        movew %d2,%a0
        movel %a0,%sp@-
        jsr %a2@
        addql #4,%sp
        dbra %d2,.L2

Now consider this function, where we change the initial value of the loop count
to be a constant:
void dbra_test2(void) {
    short i = 15;
    do {
        foo(i);
    } while(--i != -1);
}

GCC generates the following code for the body of the loop:
.L7:
        movel %d2,%sp@-
        jsr %a2@
        addql #4,%sp
        dbra %d2,.L7
        clrw %d2
        subql #1,%d2
        jcc .L7

Note the extraneous clr/subq/jcc.

During ivcanon, GCC transforms the second loop to run from 16 to 0 instead of
15 to -1. Later during ivopts, it transforms back into 15 to -1 form, but
promotes the variable from short to int. Future transformations are no longer
able to optimize around the short variable, and we end up with extraneous
checks inserted during codegen.

I've attached a simple file that reproduces the problem. GCC 2.95.3 performed
the operation correctly, but it's been broken since at least 4.3.2, possibly
earlier.

Thanks
--UD2

Reply via email to