https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56439

Thilo Schulz <thilo at tjps dot eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |thilo at tjps dot eu

--- Comment #9 from Thilo Schulz <thilo at tjps dot eu> ---
This behaviour is _really_ annoying, and it still exists in gcc 4.9.3 and
5.2.0. It is similar, to the bug you described, Mr. Lay, but not quite the
same.
This problem seems to affect all arithmetic and bitwise operations on register
variables, even on the upper registers >= r16.
It greatly diminishes the advantages of using register variables for the
purpose of optimising access to often-used global variables. Global register
variables do have an enormous program space and speed advantage over accesses
to locations in RAM, which is very significant on ATTiny MCUs with limited
flash space and in the case of my application, for fulfillment of my RTOS
requirements.

I did some more research on when this problem actually occurs, and I now have a
pretty good idea of when bad things happen. There seem to be two separate cases
of this occurring.

Please consider this first working example:

#####
#include <avr/io.h>

register uint16_t globregvar1 asm("r8");
register uint16_t globregvar2 asm("r10");
register uint8_t globregvar3 asm("r12");

int main(void)
{
  register uint8_t locregvar1 asm("r26");
  register uint8_t locregvar2 asm("r2");
  uint8_t test = 1;

  while(1)
  {
    if(test)
      test <<= 1;
    else
      test = 1;

    locregvar1 |= test;
    locregvar2 |= test;
    globregvar1++;
    globregvar2 |= test;
    globregvar3 |= test;

    __asm__ __volatile__("":: "r" (locregvar1), "r" (locregvar2));

    if(test == 7)
      PORTB = test;
  }
}
#####

The __asm__ statement does nothing and forces the compiler via constraints not
to optimize locregvar* away.

Compile with:
##
$ avr-gcc-5.2.0 -ggdb3 -mmcu=attiny13 -Os -o test.app test.c
$
##

Now the output for relevant section:

###
    locregvar1 |= test;
  2e:   a8 2b           or      r26, r24
    locregvar2 |= test;
  30:   28 2a           or      r2, r24
    globregvar1++;
  32:   9f ef           ldi     r25, 0xFF       ; 255
  34:   89 1a           sub     r8, r25
  36:   99 0a           sbc     r9, r25
    globregvar2 |= test;
  38:   a8 2a           or      r10, r24
    globregvar3 |= test;
  3a:   c8 2a           or      r12, r24

    __asm__ __volatile__("":: "r" (locregvar1), "r" (locregvar2));

    if(test == 7)
  3c:   87 30           cpi     r24, 0x07       ; 7
[...]
###

So far so good. For globregvar1 it's even so intelligent as to load -1 to r25
and then operating directly on r8 and r9 for the 2-byte value, which is great.

Watch what happens with these changes, marked with a //**** at end of line:

#####
#include <avr/io.h>

register uint16_t globregvar1 asm("r8");
register uint16_t globregvar2 asm("r10");
register uint8_t globregvar3 asm("r12");

int main(void)
{
  register uint8_t locregvar1 asm("r26");
  register uint8_t locregvar2 asm("r2");
  uint8_t test = 1;

  while(1)
  {
    if(test)
      test <<= 1;
    else
      test = 1;

    locregvar1 |= _BV(4); //****
    locregvar2 |= _BV(4); //****
    globregvar1++;
    globregvar2 |= _BV(4); //****
    globregvar3 |= _BV(4); //****

    __asm__ __volatile__("":: "r" (locregvar1), "r" (locregvar2));

    if(globregvar1 & _BV(2)) //****
      PORTB = test;
  }
}
#####

and the output:

###
    locregvar1 |= _BV(4);
  2e:   a0 61           ori     r26, 0x10       ; 16
    locregvar2 |= _BV(4);
  30:   92 2d           mov     r25, r2
  32:   90 61           ori     r25, 0x10       ; 16
  34:   29 2e           mov     r2, r25
    globregvar1++;
  36:   94 01           movw    r18, r8
  38:   2f 5f           subi    r18, 0xFF       ; 255
  3a:   3f 4f           sbci    r19, 0xFF       ; 255
  3c:   49 01           movw    r8, r18
    globregvar2 |= _BV(4);
  3e:   68 94           set
  40:   a4 f8           bld     r10, 4
    globregvar3 |= _BV(4);
  42:   9c 2d           mov     r25, r12
  44:   90 61           ori     r25, 0x10       ; 16
  46:   c9 2e           mov     r12, r25

    __asm__ __volatile__("":: "r" (locregvar1), "r" (locregvar2));

    if(globregvar1 & _BV(2))
  48:   22 ff           sbrs    r18, 2
[...]
###

So what happens is this:

1. For locregvar1, nothing interesting is happening, because the immediate
instruction can operate directly on the target reg. Good.

2. However, for locregvar2 bound to r2, it turns a 2-instruction operation into
three operations. Not good.

3. Interestingly, for globregvar2, which is a 16 bit type, it uses the very
efficient set/bld pair to set the 5th bit. Nice.

4. globregvar3 is 8 bit. Exactly the same instruction, but here it spills to
r25. Not nice, and totally inconsistent with the behaviour displayed for
globregvar2.

And finally:
5. globregvar1 is still being incremented, no change in input code here except
for that it is being used later inside that last if condition.
Now that variable spills to r18:r19. This is pretty much what is happening for
OP here, and in the case of OP this behaviour is at least understandable: If a
copy of the register variable is in one of the higher registers, one can use
the cpi instruction for comparison as opposed if you had to compare against
lower registers, so you neither lose cycles nor code space. BUT, if an
interrupt changes the register variable in the meantime, you will have a bad
race condition and you will get totally inconsistent behaviour, because the
coming comparison only operates on a copy in the upper registers. Is this
really wanted?

Furthermore, in my example, there is no need for immediate instructions, the
branch condition can be directly applied to the register variable, so no
spilling is required here.

Reply via email to