Hello all. The current implementation produces non-optimal code for large shifts that aren't a multiple of eight when operating on long integers (4 bytes). All such shifts are broken down into a slow loop shift. For example, a logical shift right by 17 will result in a loop that takes around 7 cycles per iteration resulting in ~119 cycles. This takes at best 7 instruction words.
A more efficient implementation could be: mov %B0,%D1 mov %A0,%C1 clr %C0 clr %D0 lsr %C0 ror %D0 This gives six cycles and six instruction words, but which can both be reduced to five if movw exists. There are several other locations where a more efficient implementation may be done. I'm just wondering why this functionality doesn't exist already. It seems like this would probably be fairly easy to implement, although a bit time consuming. I would also guess lack of interest or lack of use of long integers. Lack of this functionality wouldn't be a problem as one could simply split the shift. Sadly my attempts to split the shift result in it being recombined. unsigned long temp = val >> 16; return temp >> 1; gives the same assembly as return val >> 17; Thanks for any info.