Hi, looking into PR50448 there is the following C code: typedef struct { unsigned char a,b,c,d; } SPI_t;
#define SPIE (*(SPI_t volatile*) 0x0AC0) void foo (void) { SPIE.d = 0xAA; while (!(SPIE.c & 0x80)); SPIE.d = 0xBB; while (!(SPIE.c & 0x80)); } At .optimized, the .c and .d struct accesses are direct: ... D.1985_3 ={v} MEM[(volatile struct SPI_t *)2752B].c; D.1986_4 = (signed char) D.1985_3; ... MEM[(volatile struct SPI_t *)2752B].d ={v} 187; ... Then in explow.c:memory_address_addr_space() there is: /* By passing constant addresses through registers we get a chance to cse them. */ if (! cse_not_expected && CONSTANT_P (x) && CONSTANT_ADDRESS_P (x)) x = force_reg (address_mode, x); So that in .expand the code is ... (insn 5 4 6 3 (set (reg/f:HI 46) (const_int 2752 [0xac0])) pr50448.c:10 -1 (nil)) (insn 6 5 7 3 (set (reg:QI 47) (const_int -86 [0xffffffaa])) pr50448.c:10 -1 (nil)) (insn 7 6 11 3 (set (mem/s/v:QI (plus:HI (reg/f:HI 46) (const_int 3 [0x3])) [0 MEM[(volatile struct SPI_t *)2752B].d+0 S1 A8]) (reg:QI 47)) pr50448.c:10 -1 (nil)) ... This leads to unpleasant code. The machine can access all RAM locations by direct addressing. However, the resulting code is: foo: ldi r24,lo8(-86) ; 6 *movqi/2 [length = 1] ldi r30,lo8(-64) ; 34 *movhi/5 [length = 2] ldi r31,lo8(10) std Z+3,r24 ; 7 *movqi/3 [length = 1] .L2: lds r24,2754 ; 10 *movqi/4 [length = 2] sbrs r24,7 ; 43 *sbrx_branchhi [length = 2] rjmp .L2 ldi r24,lo8(-69) ; 16 *movqi/2 [length = 1] ldi r30,lo8(-64) ; 33 *movhi/5 [length = 2] ldi r31,lo8(10) std Z+3,r24 ; 17 *movqi/3 [length = 1] .L3: lds r24,2754 ; 20 *movqi/4 [length = 2] sbrs r24,7 ; 42 *sbrx_branchhi [length = 2] rjmp .L3 ret ; 39 return [length = 1] Insn 34 loads 2752 (0xAC0) to r30/r31 (Z) and does an indirect access (*(Z+3), i.e. *2755) in insn 7. The same happens in insn 33 (load 2752) and access (insn 17). Is there a way to avoid this? I tried -f[no-]rerun-cse-after-loop but without effect, same for -Os/-O2 and trying to patch rtx_costs. cse_not_expected is overridden in some places in the middle-end. What's the preferred way to avoid such "optimization" in the back-end? At least for CONST_INT addresses? AVR has just 2 pointer registers that can do such an access (and only one besides the frame pointer), so it's not very likely that a register is free and CSEing too much might lead to spills or data living in the frame causing bulky load/store from/to stack. Direct access is faster than loading the constant+indirect access (3 times slower). And iff not all of the base address 0xAC0 = 2752 gets factored out to Z+2 resp. Z+3 accesses, the code is bulkier, too. Johann