SSE needs a 16-byte aligned stack. Our kernel on randomizes the stack to an ALIGNBYTES boundary, which for amd64 means 8-byte aligned. Therefore we explicitly align the stack in crt0, but "constructors" in shared libraries get run directly by ld.so, before the crt0 code gets run. The diff below should fix the issues matthieu is seeing with pixman. It explicitly aligns the stack in the ld.so startup code as well.
ok? Index: amd64/ldasm.S =================================================================== RCS file: /cvs/src/libexec/ld.so/amd64/ldasm.S,v retrieving revision 1.7 diff -u -p -r1.7 ldasm.S --- amd64/ldasm.S 11 May 2010 16:27:14 -0000 1.7 +++ amd64/ldasm.S 19 Jul 2011 21:52:38 -0000 @@ -39,6 +39,11 @@ .type _dl_start,@function _dl_start: movq %rsp, %r12 # save stack pointer for _rtld + + subq $8, %rsp # align stack + andq $~15, %rsp + addq $8, %rsp + pushq %rbx # save ps_strings subq $DL_DATA_SIZE, %rsp # allocate dl_data