On 12/16/25 1:16 PM, Georg-Johann Lay wrote:
When a shift is performed by a shift-loop, then there are cases
where the runtime can be improved. For example, uint32_t R22 >> 5
is currently
ldi srcatch, 5
1: lsr r25
ror r24
ror r23
ror r22
dec scratch
brne 1b
but can be done as:
andi r22,-32 ; Set lower 5 bits to 0.
ori r22,16 ; Set bit 4 to 1.
;; Now r22 = 0b***10000
1: lsr r25
ror r24
ror r23
ror r22
brcc 1b ; Carry will be 0, 0, 0, 0, 1.
this is count-1 cycles faster where count is the shift offset.
In the example that's 4 cycles.
Part 1 of the patch refactors the shift output function so
it gets a shift rtx_code instead of an asm template.
Part 2 is the very optimization.
This is for trunk and passes without new regressions.
Ok to apply?
Johann
--
AVR: Refactor avr.cc::out_shift_with_cnt().
This is a no-op refactoring of out_shift_with_cnt() that passes the
shift rtx_code instead of an template asm string.
gcc/
* config/avr/avr/avr-protos.h (out_shift_with_cnt): Remove.
* config/avr/avr/avr.cc (avr_out_shift_with_cnt): New static
function from out_shift_with_cnt: Pass shift rtx_code instead
of asm template.
(avr_out_shift_1): New static helper function.
(ashlqi3_out, ashlhi3_out, avr_out_ashlpsi3, ashlsi3_out)
(ashrqi3_out, ashrhi3_out, avr_out_ashrpsi3, ashrsi3_out)
(lshrqi3_out, lshrhi3_out, avr_out_lshrpsi3, lshrsi3_out):
Adjust avr_out_shift_with_cnt to new interface.
The sentinel is a neat idea. The same thing should be possible on the
H8 with a relatively small amount of work. While there aren't many loop
cases left on the H8 there are definitely a few for the H8/300H.
Jeff