https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56439
Thilo Schulz <thilo at tjps dot eu> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |thilo at tjps dot eu --- Comment #9 from Thilo Schulz <thilo at tjps dot eu> --- This behaviour is _really_ annoying, and it still exists in gcc 4.9.3 and 5.2.0. It is similar, to the bug you described, Mr. Lay, but not quite the same. This problem seems to affect all arithmetic and bitwise operations on register variables, even on the upper registers >= r16. It greatly diminishes the advantages of using register variables for the purpose of optimising access to often-used global variables. Global register variables do have an enormous program space and speed advantage over accesses to locations in RAM, which is very significant on ATTiny MCUs with limited flash space and in the case of my application, for fulfillment of my RTOS requirements. I did some more research on when this problem actually occurs, and I now have a pretty good idea of when bad things happen. There seem to be two separate cases of this occurring. Please consider this first working example: ##### #include <avr/io.h> register uint16_t globregvar1 asm("r8"); register uint16_t globregvar2 asm("r10"); register uint8_t globregvar3 asm("r12"); int main(void) { register uint8_t locregvar1 asm("r26"); register uint8_t locregvar2 asm("r2"); uint8_t test = 1; while(1) { if(test) test <<= 1; else test = 1; locregvar1 |= test; locregvar2 |= test; globregvar1++; globregvar2 |= test; globregvar3 |= test; __asm__ __volatile__("":: "r" (locregvar1), "r" (locregvar2)); if(test == 7) PORTB = test; } } ##### The __asm__ statement does nothing and forces the compiler via constraints not to optimize locregvar* away. Compile with: ## $ avr-gcc-5.2.0 -ggdb3 -mmcu=attiny13 -Os -o test.app test.c $ ## Now the output for relevant section: ### locregvar1 |= test; 2e: a8 2b or r26, r24 locregvar2 |= test; 30: 28 2a or r2, r24 globregvar1++; 32: 9f ef ldi r25, 0xFF ; 255 34: 89 1a sub r8, r25 36: 99 0a sbc r9, r25 globregvar2 |= test; 38: a8 2a or r10, r24 globregvar3 |= test; 3a: c8 2a or r12, r24 __asm__ __volatile__("":: "r" (locregvar1), "r" (locregvar2)); if(test == 7) 3c: 87 30 cpi r24, 0x07 ; 7 [...] ### So far so good. For globregvar1 it's even so intelligent as to load -1 to r25 and then operating directly on r8 and r9 for the 2-byte value, which is great. Watch what happens with these changes, marked with a //**** at end of line: ##### #include <avr/io.h> register uint16_t globregvar1 asm("r8"); register uint16_t globregvar2 asm("r10"); register uint8_t globregvar3 asm("r12"); int main(void) { register uint8_t locregvar1 asm("r26"); register uint8_t locregvar2 asm("r2"); uint8_t test = 1; while(1) { if(test) test <<= 1; else test = 1; locregvar1 |= _BV(4); //**** locregvar2 |= _BV(4); //**** globregvar1++; globregvar2 |= _BV(4); //**** globregvar3 |= _BV(4); //**** __asm__ __volatile__("":: "r" (locregvar1), "r" (locregvar2)); if(globregvar1 & _BV(2)) //**** PORTB = test; } } ##### and the output: ### locregvar1 |= _BV(4); 2e: a0 61 ori r26, 0x10 ; 16 locregvar2 |= _BV(4); 30: 92 2d mov r25, r2 32: 90 61 ori r25, 0x10 ; 16 34: 29 2e mov r2, r25 globregvar1++; 36: 94 01 movw r18, r8 38: 2f 5f subi r18, 0xFF ; 255 3a: 3f 4f sbci r19, 0xFF ; 255 3c: 49 01 movw r8, r18 globregvar2 |= _BV(4); 3e: 68 94 set 40: a4 f8 bld r10, 4 globregvar3 |= _BV(4); 42: 9c 2d mov r25, r12 44: 90 61 ori r25, 0x10 ; 16 46: c9 2e mov r12, r25 __asm__ __volatile__("":: "r" (locregvar1), "r" (locregvar2)); if(globregvar1 & _BV(2)) 48: 22 ff sbrs r18, 2 [...] ### So what happens is this: 1. For locregvar1, nothing interesting is happening, because the immediate instruction can operate directly on the target reg. Good. 2. However, for locregvar2 bound to r2, it turns a 2-instruction operation into three operations. Not good. 3. Interestingly, for globregvar2, which is a 16 bit type, it uses the very efficient set/bld pair to set the 5th bit. Nice. 4. globregvar3 is 8 bit. Exactly the same instruction, but here it spills to r25. Not nice, and totally inconsistent with the behaviour displayed for globregvar2. And finally: 5. globregvar1 is still being incremented, no change in input code here except for that it is being used later inside that last if condition. Now that variable spills to r18:r19. This is pretty much what is happening for OP here, and in the case of OP this behaviour is at least understandable: If a copy of the register variable is in one of the higher registers, one can use the cpi instruction for comparison as opposed if you had to compare against lower registers, so you neither lose cycles nor code space. BUT, if an interrupt changes the register variable in the meantime, you will have a bad race condition and you will get totally inconsistent behaviour, because the coming comparison only operates on a copy in the upper registers. Is this really wanted? Furthermore, in my example, there is no need for immediate instructions, the branch condition can be directly applied to the register variable, so no spilling is required here.