http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34311
--- Comment #4 from Iain Sandoe <iains at gcc dot gnu.org> --- (In reply to Anthony Green from comment #3) > (In reply to Iain Sandoe from comment #2) > > However, there is no guarantee in the Darwin m32 ABI that the stacked > > version of the structs will be appropriately aligned. So, either the > > testcase is wrong code - or the process for passing structures in the > > closure needs to be refined (significant re-write, since the structs are > > currently passed in-place). > > Iain - could you please explain this a little more? Which are the stacked > structs that aren't aligned, and where do they come from? I hope I've reloaded/remembered state correctly, this was debugging done a while ago while I was doing the m64 port, and everything was fresh in my head ;) ========== Darwin uses IBM-style long doubles (two double concatenated) The psABI alignment of a long double is 16bytes. The psABI alignment of a structure with a long double as the first element is also 16bytes. However, in this code, for example: typedef struct A { long double a; unsigned char b; } A; extern struct A foo (struct A y); struct A bar (void) { struct A a = {1.5,'a'}; return foo(a); } in the call to foo() r3 = pointer to struct return r4 .. r10 are used to pass a. In a standard stack frame, the offset of r4 is not necessarily aligned to even 8bytes, much less 16. ---- But, when we build the closure - since, we don't have any other memory to play with - we point at the saved registers containing the structure (now, effectively/potentially, unaligned). If the values are picked up by a floating load or memcopied to a suitable place, the worst that happens is we suffer a performance hit for the unaligned access. But .. in this specific case [nested_struct5.c] codegen decides to optimize the structure move using vector insns (which is normally OK, since the structure is supposed to be 16byte aligned): static void B_gn(ffi_cif* cif __UNUSED__, void* resp, void** args, void* userdata __UNUSED__) { struct A b0; struct B b1; b0 = *(struct A*)(args[0]); << line 44. b1 = *(struct B*)(args[1]); *(B*)resp = B_fn(b0, b1); } ; /src/host-libs/libffi-3.0.13/testsuite/libffi.call/nested_struct5.c:44 LM8: lwz r2,0(r5) ; *args_2(D), *args_2(D) lvx v1,0,r2 ; MEM[(struct A *)_3], MEM[(struct A *)_3] li r9,16 ; tmp128, lvx v0,r2,r9 ; MEM[(struct A *)_3], MEM[(struct A *)_3] li r2,128 ; tmp146, stvx v1,r1,r2 ; MEM[(struct A *)_3], b0 li r2,144 ; tmp147, stvx v0,r1,r2 ; MEM[(struct A *)_3], b0 ; /src/host-libs/libffi-3.0.13/testsuite/libffi.call/nested_struct5.c:45 and we're then hosed, since those insns just silently ignore the lsbits of the address. ==== The code works fine if we memcopy the structs from their source (i.e. the libffi process is getting the correct stuff in the place expected) - it's just incorrectly aligned. ==== The psABI will, presumably, never change (since the platform is EOL). Perhaps codegen should not make the assumption that *(struct A*) of a void * is correctly aligned, not sure if that's a c-standard question or ... Perhaps there's a way of getting at some scratch memory when building the FFI structures, but ISTR this was not going to be easy. there are higher darwin priorities than fixing this issue - however, if I've missed some obvious workaround (quite plausible), it would be nice to have it working and would welcome suggestions. thanks for looking into this, Iain.