https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113432

            Bug ID: 113432
           Summary: missing optimization: GCC UXTB zero-extends result of
                    LDRB from volatile variables
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsaxvc at gmail dot com
  Target Milestone: ---

GCC for ARM 13.2.0(-Os, -mthumb) is missing a minor optimization when loading
from a sub-word sized volatile global variable when targeting Cortex-M0 and
Cortex-M23 , but gets it right on many other ARM cores(M3/M4/M7/A7/A9). For M0
and M23, GCC is adding a UXTB(unsigned zero-extend register byte) or UXTH to
the result of LDRB and LDRH, which is already zero-extended. I checked as far
back as GCC 5.4.1 which still has the extra UXTBs/UXTHs. Here's an example that
triggers it:

    #include <stdint.h>
    uint8_t regular_u8;
    volatile uint8_t volatile_u8;

    uint32_t load_regular_u8_as_u32(){ return regular_u8;}
    uint8_t load_regular_u8_as_u8(){return regular_u8;};
    uint32_t load_volatile_u8_as_u32(){return volatile_u8;}
    uint8_t load_volatile_u8_as_u8(){return volatile_u8;}

I would expect all four functions to assemble to:
    1) address loading(either movw/movt(-O3) or PC-relative LDR(-Os))
    2) LDRB r0, [address register of extern variable]
    3) BX LR

But, here's the resulting assembly - loading a non-volatile variable is as
expected, but loading the volatile variable uses an additional UXTB
instruction.

load_regular_u8_as_u32:
        ldr     r3, .L2
        ldrb    r0, [r3]
        bx      lr
.L2:
        .word   regular_u8
load_regular_u8_as_u8:
        ldr     r3, .L5
        ldrb    r0, [r3]
        bx      lr
.L5:
        .word   regular_u8
load_volatile_u8_as_u32:
        ldr     r3, .L8
        ldrb    r0, [r3]
        uxtb    r0, r0
        bx      lr
.L8:
        .word   volatile_u8
load_volatile_u8_as_u8:
        ldr     r3, .L11
        ldrb    r0, [r3]
        uxtb    r0, r0
        bx      lr
.L11:
        .word   volatile_u8


In the above load_volatile* cases, the UXTB can be omitted, as the LDRB already
zero-extended r0.

Reply via email to