On Wed, Mar 9, 2016 at 2:54 AM, David Laight <david.lai...@aculab.com> wrote: > From: Joe Perches >> Sent: 08 March 2016 23:26 > ... >> > + >> > + if (offset & 1) >> > + sum = (sum << 24) + (sum >> 8); >> >> Maybe use ror32(sum, 8); >> >> or maybe something like: >> >> { >> u32 sum; >> >> /* rotated csum2 of odd offset will be the right checksum */ >> if (offset & 1) >> sum = ror32((__force u32)csum2, 8); >> else >> sum = (__force u32)csum2; > > Or even: > sum = ror32((__force u32)csum2, (offset & 1) * 8); > to remove the conditional. > Assuming 'rotate by 0 bits' is valid. > If not add 16 to rotate by 16 or 24.
The problem is "ror %cl" can be significantly more expensive than just a "ror $8". In the case of x86 the difference is as much as 6 cycles or more on some of the older architectures so it may be better to just do the rotate by 8 and then an "and" or "test" and "cmovne" which is what this compiles into right now. - Alex