https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100962
Bug ID: 100962
Summary: Poor optimization of AVR code when using structs in
__flash
Product: gcc
Version: 5.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: mojo at world3 dot net
Target Milestone: ---
Example code here: https://godbolt.org/z/1hnPoGdTd
In this code a const __flash struct holds some data used to initialize
peripherals. Line 59 is the definition of the struct.
With the __flash attribute the generated AVR assembly uses the X register as a
pointer to the peripheral. The X pointer lacks displacement with LDI so rather
inefficient code is generated, e.g.
141 channels[ch].dma.ch->TRFCNT = BUFFER_SIZE;
142 channels[ch].dma.ch->REPCNT = 0;
ldi r18,lo8(26)
ldi r19,0
adiw r26,4
st X+,r18
st X,r19
sbiw r26,4+1
adiw r26,6
st X,__zero_reg__
sbiw r26,6
Removing the __flash attribute produces much better code, with the Z register
used with displacement.
The issue appears to be because the other pointer register that supports
displacement, Y, is used for the stack so unavailable. Introducing the need to
use LPM instructions to read data from flash seems to cause Z not to be used
for the peripheral, with X used instead. Z is used only for LPM.
The best possible optimisation here seems to be to read all values needed from
flash first, and then switch to using Z as a pointer to the peripheral.