Hi, looking into PR50448 there is the following C code:

typedef struct
{
    unsigned char a,b,c,d;
} SPI_t;

#define SPIE (*(SPI_t volatile*) 0x0AC0)

void foo (void)
{
    SPIE.d = 0xAA;
    while (!(SPIE.c & 0x80));

    SPIE.d = 0xBB;
    while (!(SPIE.c & 0x80));
}

At .optimized, the .c and .d struct accesses are direct:

  ...
  D.1985_3 ={v} MEM[(volatile struct SPI_t *)2752B].c;
  D.1986_4 = (signed char) D.1985_3;
  ...

  MEM[(volatile struct SPI_t *)2752B].d ={v} 187;
  ...

Then in explow.c:memory_address_addr_space() there is:

  /* By passing constant addresses through registers
     we get a chance to cse them.  */
  if (! cse_not_expected && CONSTANT_P (x) && CONSTANT_ADDRESS_P (x))
    x = force_reg (address_mode, x);

So that in .expand the code is

...
(insn 5 4 6 3 (set (reg/f:HI 46)
        (const_int 2752 [0xac0])) pr50448.c:10 -1
     (nil))

(insn 6 5 7 3 (set (reg:QI 47)
        (const_int -86 [0xffffffaa])) pr50448.c:10 -1
     (nil))

(insn 7 6 11 3 (set (mem/s/v:QI (plus:HI (reg/f:HI 46)
                (const_int 3 [0x3])) [0 MEM[(volatile struct SPI_t *)2752B].d+0
S1 A8])
        (reg:QI 47)) pr50448.c:10 -1
     (nil))
...

This leads to unpleasant code. The machine can access all RAM locations by
direct addressing. However, the resulting code is:

foo:
        ldi r24,lo8(-86)         ;  6   *movqi/2        [length = 1]
        ldi r30,lo8(-64)         ;  34  *movhi/5        [length = 2]
        ldi r31,lo8(10)
        std Z+3,r24      ;  7   *movqi/3        [length = 1]
.L2:
        lds r24,2754     ;  10  *movqi/4        [length = 2]
        sbrs r24,7       ;  43  *sbrx_branchhi  [length = 2]
        rjmp .L2
        ldi r24,lo8(-69)         ;  16  *movqi/2        [length = 1]
        ldi r30,lo8(-64)         ;  33  *movhi/5        [length = 2]
        ldi r31,lo8(10)
        std Z+3,r24      ;  17  *movqi/3        [length = 1]
.L3:
        lds r24,2754     ;  20  *movqi/4        [length = 2]
        sbrs r24,7       ;  42  *sbrx_branchhi  [length = 2]
        rjmp .L3
        ret      ;  39  return  [length = 1]

Insn 34 loads 2752 (0xAC0) to r30/r31 (Z) and does an indirect access (*(Z+3),
i.e. *2755) in insn 7.  The same happens in insn 33 (load 2752) and access
(insn 17).

Is there a way to avoid this? I tried -f[no-]rerun-cse-after-loop but without
effect, same for -Os/-O2 and trying to patch rtx_costs. cse_not_expected is
overridden in some places in the middle-end.

What's the preferred way to avoid such "optimization" in the back-end? At least
for CONST_INT addresses?

AVR has just 2 pointer registers that can do such an access (and only one
besides the frame pointer), so it's not very likely that a register is free and
CSEing too much might lead to spills or data living in the frame causing bulky
load/store from/to stack.

Direct access is faster than loading the constant+indirect access (3 times
slower). And iff not all of the base address 0xAC0 = 2752 gets factored out to
Z+2 resp. Z+3 accesses, the code is bulkier, too.

Johann

Reply via email to