On 12/16/25 1:16 PM, Georg-Johann Lay wrote:
When a shift is performed by a shift-loop, then there are cases
where the runtime can be improved. For example, uint32_t R22 >> 5
is currently

         ldi srcatch, 5
     1:  lsr r25
         ror r24
         ror r23
         ror r22
         dec scratch
         brne 1b

but can be done as:

         andi r22,-32   ; Set lower 5 bits to 0.
         ori r22,16     ; Set bit 4 to 1.
         ;; Now r22 = 0b***10000
     1:  lsr r25
         ror r24
         ror r23
         ror r22
         brcc 1b        ; Carry will be 0, 0, 0, 0, 1.

this is count-1 cycles faster where count is the shift offset.
In the example that's 4 cycles.

Part 1 of the patch refactors the shift output function so
it gets a shift rtx_code instead of an asm template.

Part 2 is the very optimization.

This is for trunk and passes without new regressions.
Ok to apply?

Johann

--

     AVR: Refactor avr.cc::out_shift_with_cnt().

     This is a no-op refactoring of out_shift_with_cnt() that passes the
     shift rtx_code instead of an template asm string.

     gcc/
             * config/avr/avr/avr-protos.h (out_shift_with_cnt): Remove.
             * config/avr/avr/avr.cc (avr_out_shift_with_cnt): New static
             function from out_shift_with_cnt: Pass shift rtx_code instead
             of asm template.
             (avr_out_shift_1): New static helper function.
             (ashlqi3_out, ashlhi3_out, avr_out_ashlpsi3, ashlsi3_out)
             (ashrqi3_out, ashrhi3_out, avr_out_ashrpsi3, ashrsi3_out)
             (lshrqi3_out, lshrhi3_out, avr_out_lshrpsi3, lshrsi3_out):
             Adjust avr_out_shift_with_cnt to new interface.
The sentinel is a neat idea. The same thing should be possible on the H8 with a relatively small amount of work. While there aren't many loop cases left on the H8 there are definitely a few for the H8/300H.

Jeff

Reply via email to