Noticed that gcc 4.3.4 doesn't optimize "add with carry" properly:

static u32
add32carry(u32 sum, u32 x)
{
  u32 z = sum + x;
  if (sum + x < x)
      z++;
  return z;
}
Becomes:
add32carry:
        add 3,3,4
        subfc 0,4,3
        subfe 0,0,0
        subfc 0,0,3
        mr 3,0
Instead of:
        addc 3,3,4
        addze 3,3

This slows down the the Internet checksum sigificantly

Also, doing this in a loop can be further optimized:

for(;len; --len)
   sum = add32carry(sum, *++buf);


        addic 3, 3, 0 /* clear carry */
.L31:
        lwzu 0,4(9)
        adde 3, 3, 0 /* add with carry */
        bdnz .L31

        addze 3, 3 /* add in final carry */

Reply via email to