Hi,
I'm cross compiling for 32 bit bare metal ARMs (modern ones: Cortex-M4 and
Cortex M-33) w/ gcc 12.3.0, which is the latest available from ARM, (see
gcc -v output below) and have found that va_arg(..., double) (i.e.
__builtin_va_arg()) assumes that doubles are 64-bit aligned, but the stack
is not always so.
I searched the bug database but didn't see this, so I'm guessing this isn't
a GCC bug--the ARM world would be on fire if it were. And I've searched the
gcc command line options docs, and the ARM architecture docs to no avail.
I'm hoping I didn't miss something obvious...
So, does gcc assume or require that doubles on the stack be 64-bit aligned,
or is there an option we should be passing to either allow 32-bit alignment
or force 64-bit alignment, or is the MCU vendor's startup code a wee buggy
(this is what I suspect, but wanted to be damn sure before continuing)?
Here's the test code:
void va_args_test(int i, ...) {
va_list args;
va_start(args, i);
double d = (int)va_arg(args, double);
va_end(args);
// display code elided
}
Here's the generated assembly, with commentary mine:
void va_args_test(int i, ...) {
3f60:→ b40f → push→ {r0, r1, r2, r3}
3f62:→ b580 → push→ {r7, lr}
3f64:→ b082 → sub→sp, #8
3f66:→ af00 → add→r7, sp, #0
va_list args;
3f68:→ 2300 → movs→ r3, #0
3f6a:→ 607b → str→r3, [r7, #4]
va_start(args, i);
3f6c:→ f107 0314 → add.w→ r3, r7, #20
3f70:→ 607b → str→r3, [r7, #4]
double d = (int)va_arg(args, double);
3f72:→ f107 031b → add.w→ r3, r7, #27 ; Loads the address of the
last byte of the low order word into r3.
3f76:→ f023 0307 → bic.w→ r3, r3, #7; Clears the low 3 bits,
which works when the double is 64-bit aligned. Not so much otherwise.
3f7a:→ f103 0208 → add.w→ r2, r3, #8; Increments args' internal
pointer
3f7e:→ 607a → str→r2, [r7, #4] ; Saves that pointer
3f80:→ e9d3 0100 → ldrd→ r0, r1, [r3] ; Reads the double, right or
wrong...
Here's the call site assembly:
va_args_test(0, (double)1.0);
3fc2:→ 2200 → movs→ r2, #0
3fc4:→ 4b09 → ldr→r3, [pc, #36]→ ; (3fec )
3fc6:→ 2000 → movs→ r0, #0
3fc8:→ 4909 → ldr→r1, [pc, #36]→ ; (3ff0 )
3fca:→ 4788 → blx→r1
This is using GCC 12.3.0, cross-compiling for ARM on x86_64 (gcc -v output
below sig), with a command line like
arm-none-eabi-gcc -o ../build/main/PAC5524/tmp/base/src/main.o
base/src/main.c <<-I options elided>>> -mcpu=cortex-m4 -march=armv7e-m
-mfpu=fpv4-sp-d16 -std=gnu99 -ffunction-sections -fno-omit-frame-pointer
-fno-strict-overflow -fsingle-precision-constant
-ftrivial-auto-var-init=zero -mthumb -mlittle-endian -mlong-calls
-mfloat-abi=hard -Og -c -MD -MP
Removing any one of the -f options happens to align the stack correctly in
most cases (I've elided the -f options that don't affect this issue as far
as I can tell).
Many thanks,
Barrie
gcc -v output:
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/share/arm-gnu-toolchain-12.3.rel1-x86_64-arm-none-eabi/bin/../libexec/gcc/arm-none-eabi/12.3.1/lto-wrapper
Target: arm-none-eabi
Configured with:
/data/jenkins/workspace/GNU-toolchain/arm-12/src/gcc/configure
--target=arm-none-eabi
--prefix=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install
--with-gmp=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
--with-mpfr=/data/jenkins/workspace/GNU-toolchai
n/arm-12/build-arm-none-eabi/host-tools
--with-mpc=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
--with-isl=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
--disable-shared --disable-nls --disable-threads --disable-tls
--enable-checking=release --enable-language
s=c,c++,fortran --with-newlib --with-gnu-as --with-gnu-ld
--with-sysroot=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install/arm-none-eabi
--with-multilib-list=aprofile,rmprofile --with-pkgversion='Arm GNU
Toolchain 12.3.Rel1 (Build arm-12.35)' --with-bugurl=
https://bugs.linaro.org/
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 12.3.1 20230626 (Arm GNU Toolchain 12.3.Rel1 (Build arm-12.35))
Test code (the LED lights very prettily when va_arg() returns the correct
value):
void va_args_test(int i, ...) {
va_list args;
va_start(args, i);
i = (int)va_arg(args, double);
va_end(args);
bal_init();
bal_set_AUX_LED1(i == 1);
}
int main(void) {
...CPU initialization elided...
va_args_test(0, (double)1.0);
while (true) {
}
}