https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115353
Bug ID: 115353
Summary: Missed thumb2 table branch instruction optimisations
Product: gcc
Version: 14.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gus at projectgus dot com
Target Milestone: ---
Created attachment 58351
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58351&action=edit
Minimal test case that previously generated tbb
After updating to gcc 14.1 we noticed that many jump tables were no longer
being optimised to use Thumb2 table branch instructions (tbb/tbh) compared to
13.2.
A bisect shows the problem seems to have been introduced by 7006e5d2d7 "arm:
Use deltas for Arm switch tables".
## Versions
The 14.1 build we were using was:
> Target: arm-none-eabi
> Configured with: /build/arm-none-eabi-gcc/src/gcc-14.1.0/configure
> --target=arm-none-eabi --prefix=/usr --with-sysroot=/usr/arm-none-eabi
> --with-native-system-header-dir=/include --libexecdir=/usr/lib
> --enable-languages=c,c++ --enable-plugins --disable-decimal-float
> --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath
> --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared
> --disable-threads --disable-tls --with-gnu-as --with-gnu-ld
> --with-system-zlib --with-newlib --with-headers=/usr/arm-none-eabi/include
> --with-python-dir=share/gcc-arm-none-eabi --with-gmp --with-mpfr --with-mpc
> --with-isl --with-libelf --enable-gnu-indirect-function
> --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm'
> --with-pkgversion='Arch Repository'
> --with-bugurl=https://gitlab.archlinux.org/archlinux/packaging/packages/arm-none-eabi-gcc/-/issues
> --with-multilib-list=rmprofile
> gcc version 14.1.0 (Arch Repository)
Local bisect builds are configured slightly differently:
> Target: arm-none-eabi
> Configured with: /home/gus/dev/gcc/configure --target=arm-none-eabi
> --prefix=/home/gus/ry/george/tmp/gcc-temp-7006e5d2
> --with-sysroot=/home/gus/ry/george/tmp/gcc-temp-7006e5d2/arm-none-eabi
> --enable-languages=c --enable-plugins --disable-decimal-float
> --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath
> --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared
> --disable-threads --disable-tls --with-gnu-as --with-gnu-ld
> --with-system-zlib --with-newlib
> --with-headers=/home/gus/ry/george/tmp/gcc-temp-7006e5d2/arm-none-eabi/include
> --with-python-dir=share/gcc-arm-none-eabi --with-gmp --with-mpfr --with-mpc
> --with-isl --with-libelf --enable-gnu-indirect-function
> --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm'
> --with-multilib-list=rmprofile
> gcc version 14.0.0 20231026 (experimental) (GCC)
## Test case
Will attach two minimal test cases, one for tbb and one with slightly larger
jump table range for tbh.
Compiled with "arm-none-eabi-gcc -mcpu=cortex-m4 -Os -Wall -Wextra".
GCC release 13.2 and parent commit of 7006e5d2d7 both optimise to table branch
instructions, i.e.
> jump_around:
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 0, uses_anonymous_args = 0
> push {r3, lr}
> cmp r0, #6
> bhi .L9
> tbb [pc, r0]
>.L4:
> .byte (.L10-.L4)/2
> .byte (.L11-.L4)/2
> .byte (.L8-.L4)/2
> .byte (.L7-.L4)/2
> .byte (.L6-.L4)/2
> .byte (.L5-.L4)/2
> .byte (.L3-.L4)/2
> .p2align 1
gcc commit 7006e5d2d7, release 14.1, and recent master branch all generate PC
address loads, i.e.
>jump_around:
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 0, uses_anonymous_args = 0
> push {r3, lr}
> cmp r0, #5
> bhi .L8
> adr r3, .L4
> ldr pc, [r3, r0, lsl #2]
> .p2align 2
>.L4:
> .word .L9+1
> .word .L10+1
> .word .L7+1
> .word .L6+1
> .word .L5+1
> .word .L3+1
> .p2align 1
For large jump tables the additional overhead of 2x or 4x code size per entry
adds up.