Compile following code with options -march=armv5te -O2 extern void *memcpy(void *dst, const void *src, int n);
void *memmove(void *dst, const void *src, int n) { const char *p = src; char *q = dst; if (__builtin_expect(q < p, 1)) { return memcpy(dst, src, n); } else { int i=0; for (; i<n; i++) q[i] = p[i]; } return dst; } gcc generates: memmove: cmp r1, r0 str r4, [sp, #-4]! mov r3, r0 mov ip, r1 mov r4, r2 bls .L8 ldmfd sp!, {r4} b memcpy .L8: cmp r2, #0 movgt r2, #0 ble .L4 .L5: ldrb r1, [ip, r2] @ zero_extendqisi2 strb r1, [r3, r2] add r2, r2, #1 cmp r2, r4 bne .L5 .L4: mov r0, r3 ldmfd sp!, {r4} bx lr The if block is expected to be more frequent than the else block, but the generated code is not very efficient. Better code could be: cmp r1, r0 bhi memcpy str r4, [sp, #-4]! mov r3, r0 mov ip, r1 mov r4, r2 L8: ... -- Summary: GCC can do less work in the frequently executed path Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: carrot at google dot com GCC build triplet: i686-linux GCC host triplet: i686-linux GCC target triplet: arm-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42497