[Bug c/53362] New: gcc 4.7 generates invalid code with -O3 and -mtune=bdver2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362 Bug #: 53362 Summary: gcc 4.7 generates invalid code with -O3 and -mtune=bdver2 Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: vale...@aimale.com Hello, I'm compiling R 2.15.0 on an AMD FX-8150, trying to take advantage of the bdver2 platform. The function jumpfun() in src/main/context.c is /* jumpfun - jump to the named context */ === static void jumpfun(RCNTXT * cptr, int mask, SEXP val) { Rboolean savevis = R_Visible; /* run onexit/cend code for all contexts down to but not including the jump target */ PROTECT(val); R_run_onexits(cptr); UNPROTECT(1); R_Visible = savevis; R_ReturnedValue = val; R_GlobalContext = cptr; /* this used to be set to cptr->nextcontext for non-toplevel jumps (with the context set back at the SETJMP for restarts). Changing this to always using cptr as the new global context should simplify some code and perhaps allow loops to be handled with fewer SETJMP's. LT */ R_restore_globals(R_GlobalContext); LONGJMP(cptr->cjmpbuf, mask); } with LONGJMP being # define LONGJMP(x,i) siglongjmp(x,i) With -O3 -mtune=bdver2 jumpfun() is compiled to: 0360 : 360: 41 56 push %r14 362: 41 55 push %r13 364: 41 89 f5mov%esi,%r13d 367: 41 54 push %r12 369: 49 89 d4mov%rdx,%r12 36c: 55 push %rbp 36d: 48 8b 2d 00 00 00 00mov0x0(%rip),%rbp# 374 374: 53 push %rbx 375: 48 89 fbmov%rdi,%rbx 378: 48 89 d7mov%rdx,%rdi 37b: 44 8b 75 00 mov0x0(%rbp),%r14d 37f: e8 00 00 00 00 callq 384 384: 48 89 dfmov%rbx,%rdi 387: e8 00 00 00 00 callq 38c 38c: bf 01 00 00 00 mov$0x1,%edi 391: e8 00 00 00 00 callq 396 396: 48 8b 05 00 00 00 00mov0x0(%rip),%rax# 39d 39d: 48 89 dfmov%rbx,%rdi 3a0: 44 89 75 00 mov%r14d,0x0(%rbp) 3a4: 4c 89 20mov%r12,(%rax) 3a7: 48 8b 05 00 00 00 00mov0x0(%rip),%rax# 3ae 3ae: 48 89 18mov%rbx,(%rax) 3b1: e8 00 00 00 00 callq 3b6 3b6: 48 8d 7b 10 lea0x10(%rbx),%rdi 3ba: 44 89 eemov%r13d,%esi 3bd: e8 00 00 00 00 callq 3c2 3c2: 66 66 66 66 66 66 2edata32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1) 3c9: 0f 1f 84 00 00 00 00 3d0: 00 3d1: 66 66 66 66 66 66 2edata32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1) 3d8: 0f 1f 84 00 00 00 00 3df: 00 with a SIGSEGV on 0x32c, while with -O -mtune=bdver2 compiles correctly to 228 : 228: 41 56 push %r14 22a: 41 55 push %r13 22c: 41 54 push %r12 22e: 55 push %rbp 22f: 53 push %rbx 230: 48 89 fbmov%rdi,%rbx 233: 41 89 f5mov%esi,%r13d 236: 48 89 d5mov%rdx,%rbp 239: 4c 8b 25 00 00 00 00mov0x0(%rip),%r12# 240 240: 45 8b 34 24 mov(%r12),%r14d 244: 48 89 d7mov%rdx,%rdi 247: e8 00 00 00 00 callq 24c 24c: 48 89 dfmov%rbx,%rdi 24f: e8 00 00 00 00 callq 254 254: bf 01 00 00 00 mov$0x1,%edi 259: e8 00 00 00 00 callq 25e 25e: 45 89 34 24 mov%r14d,(%r12) 262: 48 8b 05 00 00 00 00mov0x0(%rip),%rax# 269 269: 48 89 28mov%rbp,(%rax) 26c: 48 8b 05 00 00 00 00mov0x0(%rip),%rax# 273 273: 48 89 18mov%rbx,(%rax) 276: 48 89 dfmov%rbx,%rdi 279: e8 00 00 00 00 callq 27e 27e:
[Bug target/53362] gcc 4.7 generates invalid code with -O3 and -mtune=bdver2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362 --- Comment #2 from Valerio Aimale 2012-05-15 18:07:01 UTC --- Andrew, thank you for your email. I'll extract some code from the R code base and generate a test case. Valerio On 5/15/12 11:43 AM, pinskia at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362 > > Andrew Pinski changed: > > What|Removed |Added > > Status|UNCONFIRMED |WAITING > Last reconfirmed||2012-05-15 >Component|c |target > Ever Confirmed|0 |1 > Severity|major |normal > > --- Comment #1 from Andrew Pinski 2012-05-15 > 17:43:29 UTC --- > Can you attach a testcase that can compile and run? >
[Bug target/53362] gcc 4.7 generates invalid code with -O3 and -mtune=bdver2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362 --- Comment #3 from Valerio Aimale 2012-05-15 22:13:47 UTC --- First of all, I made a mistake. The FX-8150 (which is family 14h) requires -march=bdver1 not bdver2. The SIGSEGV, however, happens even with bdver1 To reproduce, compile R with CC=gcc-4.7\ CXX=g++-4.7 \ OBJC=gcc-4.7 \ FC=gfortran-4.7 \ F77=gfortran-4.7 \ CFLAGS="-g -O3 -march=bdver1"\ CXXFLAGS="-g -O3 -march=bdver1" \ OBJCFLAGS="-g -O3 -march=bdver1" \ FCFLAGS="-g -O3 -march=bdver1" \ FFLAGS="-g -O3 -march=bdver1"\ ./configure \ --enable-R-shlib \ --enable-threads=posix\ --with-readline \ --with-system-pcre\ --prefix=/usr/local/pkg/R-2.15.0-k15 \ --with-x \ --with-system-zlib\ --with-cairo \ --with-jpeglib\ --with-blas \ --with-lapack \ --with-tcltk \ --with-libpng Second, the SIGSEGV actually happens inside eval.c at bcEval(). Here's a more detailed description: R has a "just in time" compiler that compiles R code to a virtual machine (a la java like). The SIGSEGV, which happens when optimizing with -O3 -march=bdver1, happens in the JIT intepreter. The JIT essential has a switch { case OPERAND 1: ; case OPERAND 2: ... } with a program counter called pc This snippet --- BEGIN_MACHINE { OP(BCMISMATCH, 0): error(_("byte code version mismatch")); OP(RETURN, 0): value = GETSTACK(-1); goto done; OP(GOTO, 1): { int label = GETOP(); BC_CHECK_SIGINT(); pc = codebase + label; NEXT(); } --- which, when preprocessed, translates to: -- (__extension__ ({goto *(*pc++).v;})); init: { loop: switch(which++) { case BCMISMATCH_OP: opinfo[BCMISMATCH_OP].addr = (__extension__ &&op_BCMISMATCH); opinfo[BCMISMATCH_OP].argc = (0); goto loop; op_BCMISMATCH: Rf_error(dcgettext (((void *)0), "byte code version mismatch", __LC_MESSAGES)); case RETURN_OP: opinfo[RETURN_OP].addr = (__extension__ &&op_RETURN); opinfo[RETURN_OP].argc = (0); goto loop; op_RETURN: value = (*(R_BCNodeStackTop + (-1))); goto done; case GOTO_OP: opinfo[GOTO_OP].addr = (__extension__ &&op_GOTO); opinfo[GOTO_OP].argc = (1); goto loop; op_GOTO: { int label = (*pc++).i; do { if (++evalcount > 1000) { R_CheckUserInterrupt(); evalcount = 0; } } while (0); pc = codebase + label; (__extension__ ({goto *(*pc++).v;})); } case BRIFNOT_OP: opinfo[BRIFNOT_OP].addr = (__extension__ &&op_BRIFNOT); opinfo[BRIFNOT_OP].argc = (2); goto loop; op_BRIFNOT: { int callidx = (*pc++).i; int label = (*pc++).i; - now the line goto *(*pc++).v; when compiled as -O3 -march=bdver1 translates to 0x7786bb4e <+366>:lea0x38(%r15),%rbp 0x7786bb52 <+370>:data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1) 0x7786bb60 <+384>:jmpq *%rax 0x7786bb62 <+386>:nopw 0x0(%rax,%rax,1) I believe that the goto becomes jmpq *%rax, with nopw before and after being just fillers for 64bit alignment (not sure though I don't understand those nopw) When executing, the code had to run some bytecode; before executing 0x7786bb60 the return rip correctly contains 0x7787ad4d (gdb) stepi 0x7786bb604033 BEGIN_MACHINE { (gdb) info frame 0 Stack frame at 0x7ffeff20: rip = 0x7786bb60 in bcEval (eval.c:4033); saved rip 0x7787ad4d called by frame at 0x7fff0110 source language c. Arglist at 0x7ffef978, args: body=body@entry=0x153ecb0, rho=rho@entry=0x1540150, useCache=TRUE Locals at 0x7ffef978, Previous frame's sp is 0x7ffeff20 Saved registers: rbx at 0x7ffefee8, rbp at 0x7ffefef0, r12 at 0x7ffefef8, r13 at 0x7ffeff00, r14 at 0x7ffeff08, r15 at 0x7ffeff10, rip at 0x7ffeff18 (gdb) info program Using the running image of child Thread 0x77fde780 (LWP 25913). Program stopped at 0x7786bb60. once i execute 0x7786bb60 (gdb) stepi bcEval (useCache=FALSE, rho=0x0, body=0x0) at eval.c:4217 4217OP(GETFUN, 1): (gdb) info frame 0 Stack frame at 0x7ffefe90: rip = 0x77890f97 in bcEval (eval.c:4217); saved rip 0x7ffeff30 called by frame at 0x7ffefe98 source language c. Arglist at 0x7ffef978, args: useCache=FALSE, rho=0x0, body=0x0 Locals at 0x7ffef978, Previous frame's sp is 0x7ffefe90 Saved registers: rbx at 0x7ffefe58, rbp at 0x7ffefe60, r12 at 0x7ffefe68, r13 at 0x7ffefe70, r14 at 0x7ffefe78, r15 at 0x7ffefe80, rip at 0x7ffefe88 the return rip is 0x7ffeff30, which is outside the program virtual address space and gives the SIGSEGV when the next retq is executed. When, instead, I compile with "-O -march=bdver1" that line, goto *(*pc++).v; , compile
[Bug target/53362] gcc 4.7 generates invalid code with -O3 and -mtune=bdver2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362 --- Comment #4 from Valerio Aimale 2012-05-15 22:15:19 UTC --- On 5/15/12 11:43 AM, pinskia at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362 > > Andrew Pinski changed: > > What|Removed |Added > > Status|UNCONFIRMED |WAITING > Last reconfirmed||2012-05-15 >Component|c |target > Ever Confirmed|0 |1 > Severity|major |normal > > --- Comment #1 from Andrew Pinski 2012-05-15 > 17:43:29 UTC --- > Can you attach a testcase that can compile and run? > Andrew, I have been unable to come up with a test case, but I dug up more information. R has a "just in time" compiler that compiles R code to a virtual machine (a la java like). The SIGSEGV, which happens when optimizing with -O3 -march=bdver1, happens in the JIT intepreter. The assembler code I pointed to in the original bug-report is not where the SIGSEGV happens. Here's the code, I had to do some major digging with gdb to find the problem. the JIT essential has a switch { case OPERAND 1: ; case OPERAND 2: ... } with a program counter called pc This snippet --- BEGIN_MACHINE { OP(BCMISMATCH, 0): error(_("byte code version mismatch")); OP(RETURN, 0): value = GETSTACK(-1); goto done; OP(GOTO, 1): { int label = GETOP(); BC_CHECK_SIGINT(); pc = codebase + label; NEXT(); } --- which, when preprocessed, translates to: -- (__extension__ ({goto *(*pc++).v;})); init: { loop: switch(which++) { case BCMISMATCH_OP: opinfo[BCMISMATCH_OP].addr = (__extension__ &&op_BCMISMATCH); opinfo[BCMISMATCH_OP].argc = (0); goto loop; op_BCMISMATCH: Rf_error(dcgettext (((void *)0), "byte code version mismatch", __LC_MESSAGES)); case RETURN_OP: opinfo[RETURN_OP].addr = (__extension__ &&op_RETURN); opinfo[RETURN_OP].argc = (0); goto loop; op_RETURN: value = (*(R_BCNodeStackTop + (-1))); goto done; case GOTO_OP: opinfo[GOTO_OP].addr = (__extension__ &&op_GOTO); opinfo[GOTO_OP].argc = (1); goto loop; op_GOTO: { int label = (*pc++).i; do { if (++evalcount > 1000) { R_CheckUserInterrupt(); evalcount = 0; } } while (0); pc = codebase + label; (__extension__ ({goto *(*pc++).v;})); } case BRIFNOT_OP: opinfo[BRIFNOT_OP].addr = (__extension__ &&op_BRIFNOT); opinfo[BRIFNOT_OP].argc = (2); goto loop; op_BRIFNOT: { int callidx = (*pc++).i; int label = (*pc++).i; - now the line goto *(*pc++).v; when compiled as -O3 -march=bdver1 translates to 0x7786bb4e <+366>:lea0x38(%r15),%rbp 0x7786bb52 <+370>:data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1) 0x7786bb60 <+384>:jmpq *%rax 0x7786bb62 <+386>:nopw 0x0(%rax,%rax,1) I believe that the goto becomes jmpq *%rax, with nopw before and after being just fillers for 64bit alignment (not sure though I don't understand those nopw) When executing, the code had to run some bytecode; before executing 0x7786bb60 the return rip correctly contains 0x7787ad4d (gdb) stepi 0x7786bb604033 BEGIN_MACHINE { (gdb) info frame 0 Stack frame at 0x7ffeff20: rip = 0x7786bb60 in bcEval (eval.c:4033); saved rip 0x7787ad4d called by frame at 0x7fff0110 source language c. Arglist at 0x7ffef978, args: body=body@entry=0x153ecb0, rho=rho@entry=0x1540150, useCache=TRUE Locals at 0x7ffef978, Previous frame's sp is 0x7ffeff20 Saved registers: rbx at 0x7ffefee8, rbp at 0x7ffefef0, r12 at 0x7ffefef8, r13 at 0x7ffeff00, r14 at 0x7ffeff08, r15 at 0x7ffeff10, rip at 0x7ffeff18 (gdb) info program Using the running image of child Thread 0x77fde780 (LWP 25913). Program stopped at 0x7786bb60. once i execute 0x7786bb60 (gdb) stepi bcEval (useCache=FALSE, rho=0x0, body=0x0) at eval.c:4217 4217OP(GETFUN, 1): (gdb) info frame 0 Stack frame at 0x7ffefe90: rip = 0x77890f97 in bcEval (eval.c:4217); saved rip 0x7ffeff30 called by frame at 0x7ffefe98 source language c. Arglist at 0x7ffef978, args: useCache=FALSE, rho=0x0, body=0x0 Locals at 0x7ffef978, Previous frame's sp is 0x7ffefe90 Saved registers: rbx at 0x7ffefe58, rbp at 0x7ffefe60, r12 at 0x7ffefe68, r13 at 0x7ffefe70, r14 at 0x7ffefe78, r15 at 0x7ffefe80, rip at 0x7ffefe88 the return rip is 0x7ffeff30, which is outside the program virtual address space and gives the SIGSEGV when the next retq is executed. When, instead, I compile with "-O -march=bdver1" that line, goto *(*pc++).v; , compiles to 209d: 48 83 c3