https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100962
Bug ID: 100962 Summary: Poor optimization of AVR code when using structs in __flash Product: gcc Version: 5.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: mojo at world3 dot net Target Milestone: --- Example code here: https://godbolt.org/z/1hnPoGdTd In this code a const __flash struct holds some data used to initialize peripherals. Line 59 is the definition of the struct. With the __flash attribute the generated AVR assembly uses the X register as a pointer to the peripheral. The X pointer lacks displacement with LDI so rather inefficient code is generated, e.g. 141 channels[ch].dma.ch->TRFCNT = BUFFER_SIZE; 142 channels[ch].dma.ch->REPCNT = 0; ldi r18,lo8(26) ldi r19,0 adiw r26,4 st X+,r18 st X,r19 sbiw r26,4+1 adiw r26,6 st X,__zero_reg__ sbiw r26,6 Removing the __flash attribute produces much better code, with the Z register used with displacement. The issue appears to be because the other pointer register that supports displacement, Y, is used for the stack so unavailable. Introducing the need to use LPM instructions to read data from flash seems to cause Z not to be used for the peripheral, with X used instead. Z is used only for LPM. The best possible optimisation here seems to be to read all values needed from flash first, and then switch to using Z as a pointer to the peripheral.