http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57231
Bug ID: 57231 Summary: Hoist zero-extend operations when possible Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: josh.m.conner at gmail dot com Compiling this code at -O2: unsigned char *value; unsigned short foobar (int iters) { unsigned short total; unsigned int i; for (i = 0; i < iters; i++) total += value[i]; return total; } On ARM generates a zero-extend of total for every iteration of the loop: .L3: ldrb r1, [ip, r3] @ zero_extendqisi2 add r3, r3, #1 cmp r3, r0 add r2, r2, r1 uxth r2, r2 bne .L3 I believe we should be able to hoist the zero-extend (uxth) after the loop. Note that although I manifested this for ARM, I believe it's a general case that would have to be handled by the rtl optimizers. This shows up in a hot loop of bzip2: for (i = gs; i <= ge; i++) { UInt16 icv = szptr[i]; cost0 += len[0][icv]; cost1 += len[1][icv]; cost2 += len[2][icv]; cost3 += len[3][icv]; cost4 += len[4][icv]; cost5 += len[5][icv]; }