On Wed, Mar 9, 2016 at 2:54 AM, David Laight <david.lai...@aculab.com> wrote:
> From: Joe Perches
>> Sent: 08 March 2016 23:26
> ...
>> > +
>> > +   if (offset & 1)
>> > +           sum = (sum << 24) + (sum >> 8);
>>
>> Maybe use ror32(sum, 8);
>>
>> or maybe something like:
>>
>> {
>>       u32 sum;
>>
>>       /* rotated csum2 of odd offset will be the right checksum */
>>       if (offset & 1)
>>               sum = ror32((__force u32)csum2, 8);
>>       else
>>               sum = (__force u32)csum2;
>
> Or even:
>         sum = ror32((__force u32)csum2, (offset & 1) * 8);
> to remove the conditional.
> Assuming 'rotate by 0 bits' is valid.
> If not add 16 to rotate by 16 or 24.

The problem is "ror %cl" can be significantly more expensive than just
a "ror $8".  In the case of x86 the difference is as much as 6 cycles
or more on some of the older architectures so it may be better to just
do the rotate by 8 and then an "and" or "test" and "cmovne" which is
what this compiles into right now.

- Alex

Reply via email to