generated movaps with unaligned memory
Hi, my shared library crashes with movaps instruction using not aligned memory. Since the shared library function is being called from dynamic linker, which basically prepares the memory location, I'm not sure whoose side issues this is. I have following function in C: typedef float La_x86_64_xmm __attribute__ ((__vector_size__ (16))); typedef struct La_x86_64_retval { uint64_t lrv_rax; uint64_t lrv_rdx; La_x86_64_xmm lrv_xmm0; La_x86_64_xmm lrv_xmm1; long double lrv_st0; long double lrv_st1; } La_x86_64_retval; unsigned int la_x86_64_gnu_pltexit (Elf64_Sym *__sym, unsigned int __ndx, uintptr_t *__refcook, uintptr_t *__defcook, const La_x86_64_regs *__inregs, La_x86_64_retval *__outregs, const char *symname) { La_x86_64_xmm b __attribute__ ((aligned(16))); b = __outregs->lrv_xmm0; return 0; } this will endup in following assembly: 07d7 : 7d7: 55 push %rbp 7d8: 48 89 e5mov%rsp,%rbp 7db: 48 89 7d e8 mov%rdi,-0x18(%rbp) 7df: 89 75 e4mov%esi,-0x1c(%rbp) 7e2: 48 89 55 d8 mov%rdx,-0x28(%rbp) 7e6: 48 89 4d d0 mov%rcx,-0x30(%rbp) 7ea: 4c 89 45 c8 mov%r8,-0x38(%rbp) 7ee: 4c 89 4d c0 mov%r9,-0x40(%rbp) 7f2: 48 8b 45 c0 mov-0x40(%rbp),%rax 7f6: 0f 28 40 10 movaps 0x10(%rax),%xmm0 7fa: 0f 29 45 f0 movaps %xmm0,-0x10(%rbp) 7fe: b8 00 00 00 00 mov$0x0,%eax 803: c9 leaveq 804: c3 retq Looks like xmm0 register is being used to transfer the data. However the structure's alignment is not 16, so it will crash. Now I'm not sure who should take of it? Should the dynamic linker, who is basically calling this function, ensure the structure is aligned on 16 bytes? Or should gcc make sure it works on aligned memory when emits movaps? (looks more likely to me...) Also is there any way to ask gcc to emit movups instead movaps? That would be nice workaround :) thanks for any hint/ideas/help here are my gcc spec: Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20070925 (Red Hat 4.1.2-33) regards, jirka
Re: generated movaps with unaligned memory
On Mon, Feb 23, 2009 at 7:35 PM, H.J. Lu wrote: > On Mon, Feb 23, 2009 at 10:05 AM, Jiri Olsa wrote: >> Hi, >> >> my shared library crashes with movaps instruction using not aligned memory. >> >> Since the shared library function is being called from dynamic linker, which >> basically prepares the memory location, I'm not sure whoose side issues this >> is. >> >> I have following function in C: >> >> typedef float La_x86_64_xmm __attribute__ ((__vector_size__ (16))); >> >> typedef struct La_x86_64_retval >> { >> uint64_t lrv_rax; >> uint64_t lrv_rdx; >> La_x86_64_xmm lrv_xmm0; >> La_x86_64_xmm lrv_xmm1; >> long double lrv_st0; >> long double lrv_st1; >> } La_x86_64_retval; >> >> unsigned int la_x86_64_gnu_pltexit (Elf64_Sym *__sym, >> unsigned int __ndx, uintptr_t *__refcook, uintptr_t *__defcook, >> const La_x86_64_regs *__inregs, La_x86_64_retval >> *__outregs, const char *symname) >> { >> La_x86_64_xmm b __attribute__ ((aligned(16))); >> b = __outregs->lrv_xmm0; >> return 0; >> } >> >> this will endup in following assembly: >> >> 07d7 : >> 7d7: 55 push %rbp >> 7d8: 48 89 e5mov%rsp,%rbp >> 7db: 48 89 7d e8 mov%rdi,-0x18(%rbp) >> 7df: 89 75 e4mov%esi,-0x1c(%rbp) >> 7e2: 48 89 55 d8 mov%rdx,-0x28(%rbp) >> 7e6: 48 89 4d d0 mov%rcx,-0x30(%rbp) >> 7ea: 4c 89 45 c8 mov%r8,-0x38(%rbp) >> 7ee: 4c 89 4d c0 mov%r9,-0x40(%rbp) >> 7f2: 48 8b 45 c0 mov-0x40(%rbp),%rax >> 7f6: 0f 28 40 10 movaps 0x10(%rax),%xmm0 >> 7fa: 0f 29 45 f0 movaps %xmm0,-0x10(%rbp) >> 7fe: b8 00 00 00 00 mov$0x0,%eax >> 803: c9 leaveq >> 804: c3 retq >> >> >> Looks like xmm0 register is being used to transfer the data. However >> the structure's alignment is not 16, so it will crash. >> > > Where exactly is it crashed? Which the structure isn't aligned at 16byte? > > > > -- > H.J. > sry, it crashes on this one 7f6: 0f 28 40 10 movaps 0x10(%rax),%xmm0 This structure/argument is not aligned at 16 La_x86_64_retval *__outreg the '__outregs->lrv_xmm0' is at 16th byte of the structure... jirka
Re: generated movaps with unaligned memory
On Mon, Feb 23, 2009 at 7:48 PM, H.J. Lu wrote: > On Mon, Feb 23, 2009 at 10:39 AM, Jiri Olsa wrote: >> On Mon, Feb 23, 2009 at 7:35 PM, H.J. Lu wrote: >>> On Mon, Feb 23, 2009 at 10:05 AM, Jiri Olsa wrote: >>>> Hi, >>>> >>>> my shared library crashes with movaps instruction using not aligned memory. >>>> >>>> Since the shared library function is being called from dynamic linker, >>>> which >>>> basically prepares the memory location, I'm not sure whoose side issues >>>> this is. >>>> >>>> I have following function in C: >>>> >>>> typedef float La_x86_64_xmm __attribute__ ((__vector_size__ (16))); >>>> >>>> typedef struct La_x86_64_retval >>>> { >>>> uint64_t lrv_rax; >>>> uint64_t lrv_rdx; >>>> La_x86_64_xmm lrv_xmm0; >>>> La_x86_64_xmm lrv_xmm1; >>>> long double lrv_st0; >>>> long double lrv_st1; >>>> } La_x86_64_retval; >>>> >>>> unsigned int la_x86_64_gnu_pltexit (Elf64_Sym *__sym, >>>> unsigned int __ndx, uintptr_t *__refcook, uintptr_t >>>> *__defcook, >>>> const La_x86_64_regs *__inregs, La_x86_64_retval >>>> *__outregs, const char *symname) >>>> { >>>> La_x86_64_xmm b __attribute__ ((aligned(16))); >>>> b = __outregs->lrv_xmm0; >>>> return 0; >>>> } >>>> >>>> this will endup in following assembly: >>>> >>>> 07d7 : >>>> 7d7: 55 push %rbp >>>> 7d8: 48 89 e5 mov %rsp,%rbp >>>> 7db: 48 89 7d e8 mov %rdi,-0x18(%rbp) >>>> 7df: 89 75 e4 mov %esi,-0x1c(%rbp) >>>> 7e2: 48 89 55 d8 mov %rdx,-0x28(%rbp) >>>> 7e6: 48 89 4d d0 mov %rcx,-0x30(%rbp) >>>> 7ea: 4c 89 45 c8 mov %r8,-0x38(%rbp) >>>> 7ee: 4c 89 4d c0 mov %r9,-0x40(%rbp) >>>> 7f2: 48 8b 45 c0 mov -0x40(%rbp),%rax >>>> 7f6: 0f 28 40 10 movaps 0x10(%rax),%xmm0 >>>> 7fa: 0f 29 45 f0 movaps %xmm0,-0x10(%rbp) >>>> 7fe: b8 00 00 00 00 mov $0x0,%eax >>>> 803: c9 leaveq >>>> 804: c3 retq >>>> >>>> >>>> Looks like xmm0 register is being used to transfer the data. However >>>> the structure's alignment is not 16, so it will crash. >>>> >>> >>> Where exactly is it crashed? Which the structure isn't aligned at 16byte? >>> >>> >>> >>> -- >>> H.J. >>> >> >> >> sry, it crashes on this one >> >> 7f6: 0f 28 40 10 movaps 0x10(%rax),%xmm0 >> >> This structure/argument is not aligned at 16 >> La_x86_64_retval *__outreg >> >> the '__outregs->lrv_xmm0' is at 16th byte of the structure... >> > > Why isn't __outregs aligned at 16byte? According to x86-64 psABI, > La_x86_64_retval should be aligned at 16byte. > > > > -- > H.J. > thanks, then the issue is in the glibc dynamic linker _dl_runtime_profile function (sysdeps/x86_64/dl-trampoline.S). Looks like it should ensure the outregs parameter is aligned to 16. resending to the glibc mailing lists the gdb session showing not aligned outregs parameter of the _dl_call_pltexit func: ... (gdb) b _dl_call_pltexit Function "_dl_call_pltexit" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (_dl_call_pltexit) pending. (gdb) r Starting program: /opt/crash symbol __libc_start_main symbol printf Breakpoint 1, _dl_call_pltexit (l=0x77ffd000, reloc_offset=0, inregs=0x7fffe418, outregs=0x7fffe3c8) at dl-runtime.c:408 408 { (gdb) I'm using the latest glibc git code (commit a839a1d3f4fbc0d75acd0dc4cad10cf6141e4963) let me know if you need more info.. thanks for help, jirka
Re: generated movaps with unaligned memory
On Tue, Feb 24, 2009 at 3:53 PM, H.J. Lu wrote: > On Mon, Feb 23, 2009 at 11:46 PM, Jiri Olsa wrote: >> On Mon, Feb 23, 2009 at 7:48 PM, H.J. Lu wrote: >>> On Mon, Feb 23, 2009 at 10:39 AM, Jiri Olsa wrote: >>>> On Mon, Feb 23, 2009 at 7:35 PM, H.J. Lu wrote: >>>>> On Mon, Feb 23, 2009 at 10:05 AM, Jiri Olsa wrote: >>>>>> Hi, >>>>>> >>>>>> my shared library crashes with movaps instruction using not aligned >>>>>> memory. >>>>>> >>>>>> Since the shared library function is being called from dynamic linker, >>>>>> which >>>>>> basically prepares the memory location, I'm not sure whoose side issues >>>>>> this is. >>>>>> >>>>>> I have following function in C: >>>>>> >>>>>> typedef float La_x86_64_xmm __attribute__ ((__vector_size__ (16))); >>>>>> >>>>>> typedef struct La_x86_64_retval >>>>>> { >>>>>> uint64_t lrv_rax; >>>>>> uint64_t lrv_rdx; >>>>>> La_x86_64_xmm lrv_xmm0; >>>>>> La_x86_64_xmm lrv_xmm1; >>>>>> long double lrv_st0; >>>>>> long double lrv_st1; >>>>>> } La_x86_64_retval; >>>>>> >>>>>> unsigned int la_x86_64_gnu_pltexit (Elf64_Sym *__sym, >>>>>> unsigned int __ndx, uintptr_t *__refcook, uintptr_t >>>>>> *__defcook, >>>>>> const La_x86_64_regs *__inregs, La_x86_64_retval >>>>>> *__outregs, const char *symname) >>>>>> { >>>>>> La_x86_64_xmm b __attribute__ ((aligned(16))); >>>>>> b = __outregs->lrv_xmm0; >>>>>> return 0; >>>>>> } >>>>>> >>>>>> this will endup in following assembly: >>>>>> >>>>>> 07d7 : >>>>>> 7d7: 55 push %rbp >>>>>> 7d8: 48 89 e5 mov %rsp,%rbp >>>>>> 7db: 48 89 7d e8 mov %rdi,-0x18(%rbp) >>>>>> 7df: 89 75 e4 mov %esi,-0x1c(%rbp) >>>>>> 7e2: 48 89 55 d8 mov %rdx,-0x28(%rbp) >>>>>> 7e6: 48 89 4d d0 mov %rcx,-0x30(%rbp) >>>>>> 7ea: 4c 89 45 c8 mov %r8,-0x38(%rbp) >>>>>> 7ee: 4c 89 4d c0 mov %r9,-0x40(%rbp) >>>>>> 7f2: 48 8b 45 c0 mov -0x40(%rbp),%rax >>>>>> 7f6: 0f 28 40 10 movaps 0x10(%rax),%xmm0 >>>>>> 7fa: 0f 29 45 f0 movaps %xmm0,-0x10(%rbp) >>>>>> 7fe: b8 00 00 00 00 mov $0x0,%eax >>>>>> 803: c9 leaveq >>>>>> 804: c3 retq >>>>>> >>>>>> >>>>>> Looks like xmm0 register is being used to transfer the data. However >>>>>> the structure's alignment is not 16, so it will crash. >>>>>> >>>>> >>>>> Where exactly is it crashed? Which the structure isn't aligned at 16byte? >>>>> >>>>> >>>>> >>>>> -- >>>>> H.J. >>>>> >>>> >>>> >>>> sry, it crashes on this one >>>> >>>> 7f6: 0f 28 40 10 movaps 0x10(%rax),%xmm0 >>>> >>>> This structure/argument is not aligned at 16 >>>> La_x86_64_retval *__outreg >>>> >>>> the '__outregs->lrv_xmm0' is at 16th byte of the structure... >>>> >>> >>> Why isn't __outregs aligned at 16byte? According to x86-64 psABI, >>> La_x86_64_retval should be aligned at 16byte. >>> >>> >>> >>> -- >>> H.J. >>> >> >> thanks, >> >> then the issue is in the glibc dynamic linker _dl_runtime_profile function >> (sysdeps/x86_64/dl-trampoline.S). Looks like it should ensure >> the outregs parameter is aligned to 16. >> >> resending to the glibc mailing lists >> >> the gdb session showing not aligned outregs parameter of the >> _dl_call_pltexit func: >> >> ... >> (gdb) b _dl_call_pltexit >> Function "_dl_call_pltexit" not defined. >> Make breakpoint pending on future shared library load? (y or [n]) y >> Breakpoint 1 (_dl_call_pltexit) pending. >> (gdb) r >> Starting program: /opt/crash >> symbol __libc_start_main >> symbol printf >> >> Breakpoint 1, _dl_call_pltexit (l=0x77ffd000, reloc_offset=0, >> inregs=0x7fffe418, outregs=0x7fffe3c8) at dl-runtime.c:408 >> 408 { >> (gdb) >> >> >> I'm using the latest glibc git code >> (commit a839a1d3f4fbc0d75acd0dc4cad10cf6141e4963) >> >> let me know if you need more info.. >> > > Please open a glibc bug with a complete testcase so that > people can reproduce it. > > -- > H.J. > created bug 9893 http://sources.redhat.com/bugzilla/show_bug.cgi?id=9893 let me know if I can help with smth, jirka