Hello all.
The current implementation produces non-optimal code for large shifts
that aren't a multiple of eight when operating on long integers (4
bytes).
All such shifts are broken down into a slow loop shift.
For example, a logical shift right by 17 will result in a loop that
takes around 7 cycles per iteration resulting in ~119 cycles.
This takes at best 7 instruction words.
A more efficient implementation could be:
mov %B0,%D1
mov %A0,%C1
clr %C0
clr %D0
lsr %C0
ror %D0
This gives six cycles and six instruction words, but which can both be
reduced to five if movw exists.
There are several other locations where a more efficient
implementation may be done.
I'm just wondering why this functionality doesn't exist already.
It seems like this would probably be fairly easy to implement,
although a bit time consuming.
I would also guess lack of interest or lack of use of long integers.
Lack of this functionality wouldn't be a problem as one could simply
split the shift.
Sadly my attempts to split the shift result in it being recombined.
unsigned long temp = val >> 16;
return temp >> 1;
gives the same assembly as
return val >> 17;
Thanks for any info.