Noticed that gcc 4.3.4 doesn't optimize "add with carry" properly:
static u32 add32carry(u32 sum, u32 x) { u32 z = sum + x; if (sum + x < x) z++; return z; } Becomes: add32carry: add 3,3,4 subfc 0,4,3 subfe 0,0,0 subfc 0,0,3 mr 3,0 Instead of: addc 3,3,4 addze 3,3 This slows down the the Internet checksum sigificantly Also, doing this in a loop can be further optimized: for(;len; --len) sum = add32carry(sum, *++buf); addic 3, 3, 0 /* clear carry */ .L31: lwzu 0,4(9) adde 3, 3, 0 /* add with carry */ bdnz .L31 addze 3, 3 /* add in final carry */