https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82658
Bug ID: 82658 Summary: Suboptimal codegen on AVR when right-shifting 8-bit unsigned integers. Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: mike.k at digitalcarbide dot com Target Milestone: --- This issue has been validated to occur back as far as at least 5.4.0, and still occurs in trunk. When shifting an unsigned char/uint8_t right by less than 4 bits, suboptimal code is generated. This behavior only occurs when compiling source files as C++, not as C, even when the source file is equivalent otherwise. The issue does not manifest with left shifts or with larger composite types (such as uint16_t). Trivial test: void test () { volatile unsigned char val; unsigned char local = val; local >>= 1; val = local; } Compiling as C++ (avr-g++ [-O3|-O2] -mmcu=atmega2560 test.cpp -S -c -o test.s) results in the following assembly sequence handling the load, shift, and store: ldd r24,Y+1 ldi r25,0 asr r25 ror r24 std Y+1,r24 The next operation performed on r25 is a clr. Thus, ldi/asr/ror are entirely equivalent to lsr in this situation, which is what the C frontend does: Compiling as C (avr-gcc [-O3|-O2] -mmcu=atmega2560 test.c -S -c -o test.s) results in the following assembly sequence handling the load, shift, and store: ldd r24,Y+1 lsr r24 std Y+1,r24 This is optimal code. This is also the defined behavior in avr.c. The issue becomes more problematic with larger shifts (up until 4, where the defined behavior takes over again), as it generates the same instruction sequence repeatedly, whereas gcc simply generates 'lsr; lsr; lsr', as expected. Interestingly, the issue does _not_ manifest if one chooses to use an integer division instead of a shift - if one divides the unsigned char by 2 instead of shifting right 1, it emits 'lsr' as expected.