Hello, When I built blob with arm-iwmmxt-linux-gnueabi toolchain, I found the SP value before invoking number() in printf() may be 0 or 4 modulo 8. If SP is 0 modulo 8, printf worked well, but while SP is 4 modulo 8, printf failed. It cannot store long long type parameter into stack before invoking number() with strd instruction. I was using GCC-4.1.1.
According to ABI for ARM architecture in http://www.arm.com/miscPDFs/13176.pdf, it seems GCC should address the implementation of va_arg. Below is the excerption from that spec. Does anybody know how to solve this issue or make sure GCC always generate 8-byte aligned SP? 2.3.2.3 Repair of va_start and va_arg To avoid injecting a fault into their users' programs in execution environments that do not correctly align SP, software development tools should offer an option (Q-o-I) to repair the C library's stdarg.h macros va_start and va_arg, as follows. (We assume va_start expands to a call to the intrinsic function __va_start, and va_arg to a call to __va_arg. It is already very difficult (or impossible) to implement va_start and va_arg in a way that evaluates each argument only once (as required by the C standard) without the assistance of at least one intrinsic function). __va_start should return a pointer value ap with bit[1] set if SP was 4 modulo 8 on entry to the containing function. The function containing the call to __va_start has the variadic parameter list allocated in the stack frame. Because arguments are guaranteed to be 4-byte aligned (by C's argument promotion rules and the AAPCS requirement that SP be 4-byte aligned at all instants), bits[1:0] of ap are otherwise 0. Coding the SP-misaligned case as 1 produces a __va_start compatible with an ordinary (not repaired) __va_arg in conforming environments in which SP is 0 modulo 8 at function entry. If T is a data type requiring 8-byte alignment, __va_arg(ap, T) must increment the pointer it calculates by 4 bytes (to skip a padding word inserted at compile time) if: (bit[1] of ap is 0 and bit[2] of ap is 1) or (bit[1] of ap is 1 and bit[2] of ap is 0). Whatever the sort of T, __va_arg(ap, T) must clear bit 1 of the pointer it calculates before dereferencing it. This implementation of __va_arg is compatible with an ordinary (not repaired) __va_start in conforming environments in which SP is 0 modulo 8 at function entry and bit 1 of ap is always 0. -- best regards, -Bridge