http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57231

            Bug ID: 57231
           Summary: Hoist zero-extend operations when possible
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: josh.m.conner at gmail dot com

Compiling this code at -O2:

  unsigned char *value;

  unsigned short foobar (int iters)
  {
    unsigned short total;
    unsigned int i;

    for (i = 0; i < iters; i++)
      total += value[i];

    return total;
  }

On ARM generates a zero-extend of total for every iteration of the loop:

  .L3:
        ldrb    r1, [ip, r3]    @ zero_extendqisi2
        add     r3, r3, #1
        cmp     r3, r0
        add     r2, r2, r1
        uxth    r2, r2
        bne     .L3

I believe we should be able to hoist the zero-extend (uxth) after the loop.

Note that although I manifested this for ARM, I believe it's a general case
that would have to be handled by the rtl optimizers.

This shows up in a hot loop of bzip2:

            for (i = gs; i <= ge; i++) {
               UInt16 icv = szptr[i];
               cost0 += len[0][icv];
               cost1 += len[1][icv];
               cost2 += len[2][icv];
               cost3 += len[3][icv];
               cost4 += len[4][icv];
               cost5 += len[5][icv];
            }

Reply via email to