https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113432
Bug ID: 113432 Summary: missing optimization: GCC UXTB zero-extends result of LDRB from volatile variables Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsaxvc at gmail dot com Target Milestone: --- GCC for ARM 13.2.0(-Os, -mthumb) is missing a minor optimization when loading from a sub-word sized volatile global variable when targeting Cortex-M0 and Cortex-M23 , but gets it right on many other ARM cores(M3/M4/M7/A7/A9). For M0 and M23, GCC is adding a UXTB(unsigned zero-extend register byte) or UXTH to the result of LDRB and LDRH, which is already zero-extended. I checked as far back as GCC 5.4.1 which still has the extra UXTBs/UXTHs. Here's an example that triggers it: #include <stdint.h> uint8_t regular_u8; volatile uint8_t volatile_u8; uint32_t load_regular_u8_as_u32(){ return regular_u8;} uint8_t load_regular_u8_as_u8(){return regular_u8;}; uint32_t load_volatile_u8_as_u32(){return volatile_u8;} uint8_t load_volatile_u8_as_u8(){return volatile_u8;} I would expect all four functions to assemble to: 1) address loading(either movw/movt(-O3) or PC-relative LDR(-Os)) 2) LDRB r0, [address register of extern variable] 3) BX LR But, here's the resulting assembly - loading a non-volatile variable is as expected, but loading the volatile variable uses an additional UXTB instruction. load_regular_u8_as_u32: ldr r3, .L2 ldrb r0, [r3] bx lr .L2: .word regular_u8 load_regular_u8_as_u8: ldr r3, .L5 ldrb r0, [r3] bx lr .L5: .word regular_u8 load_volatile_u8_as_u32: ldr r3, .L8 ldrb r0, [r3] uxtb r0, r0 bx lr .L8: .word volatile_u8 load_volatile_u8_as_u8: ldr r3, .L11 ldrb r0, [r3] uxtb r0, r0 bx lr .L11: .word volatile_u8 In the above load_volatile* cases, the UXTB can be omitted, as the LDRB already zero-extended r0.