[Bug c/53362] New: gcc 4.7 generates invalid code with -O3 and -mtune=bdver2

2012-05-15 Thread valerio at aimale dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362

 Bug #: 53362
   Summary: gcc 4.7 generates invalid code with -O3 and
-mtune=bdver2
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vale...@aimale.com


Hello,

I'm compiling R 2.15.0 on an AMD FX-8150, trying to take advantage of the
bdver2 platform.

The function jumpfun() in src/main/context.c is

/* jumpfun - jump to the named context */
===
static void jumpfun(RCNTXT * cptr, int mask, SEXP val)
{
Rboolean savevis = R_Visible;

/* run onexit/cend code for all contexts down to but not including
   the jump target */
PROTECT(val);
R_run_onexits(cptr);
UNPROTECT(1);
R_Visible = savevis;

R_ReturnedValue = val;
R_GlobalContext = cptr; /* this used to be set to
   cptr->nextcontext for non-toplevel
   jumps (with the context set back at the
   SETJMP for restarts).  Changing this to
   always using cptr as the new global
   context should simplify some code and
   perhaps allow loops to be handled with
   fewer SETJMP's.  LT */
R_restore_globals(R_GlobalContext);

LONGJMP(cptr->cjmpbuf, mask);
}

with LONGJMP being

# define LONGJMP(x,i) siglongjmp(x,i)

With -O3 -mtune=bdver2 jumpfun() is compiled to:

0360 :
 360:   41 56   push   %r14
 362:   41 55   push   %r13
 364:   41 89 f5mov%esi,%r13d
 367:   41 54   push   %r12
 369:   49 89 d4mov%rdx,%r12
 36c:   55  push   %rbp
 36d:   48 8b 2d 00 00 00 00mov0x0(%rip),%rbp# 374

 374:   53  push   %rbx
 375:   48 89 fbmov%rdi,%rbx
 378:   48 89 d7mov%rdx,%rdi
 37b:   44 8b 75 00 mov0x0(%rbp),%r14d
 37f:   e8 00 00 00 00  callq  384 
 384:   48 89 dfmov%rbx,%rdi
 387:   e8 00 00 00 00  callq  38c 
 38c:   bf 01 00 00 00  mov$0x1,%edi
 391:   e8 00 00 00 00  callq  396 
 396:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 39d

 39d:   48 89 dfmov%rbx,%rdi
 3a0:   44 89 75 00 mov%r14d,0x0(%rbp)
 3a4:   4c 89 20mov%r12,(%rax)
 3a7:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 3ae

 3ae:   48 89 18mov%rbx,(%rax)
 3b1:   e8 00 00 00 00  callq  3b6 
 3b6:   48 8d 7b 10 lea0x10(%rbx),%rdi
 3ba:   44 89 eemov%r13d,%esi
 3bd:   e8 00 00 00 00  callq  3c2 
 3c2:   66 66 66 66 66 66 2edata32 data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
 3c9:   0f 1f 84 00 00 00 00
 3d0:   00
 3d1:   66 66 66 66 66 66 2edata32 data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
 3d8:   0f 1f 84 00 00 00 00
 3df:   00

with a SIGSEGV on 0x32c, while with -O -mtune=bdver2 compiles correctly to

228 :
 228:   41 56   push   %r14
 22a:   41 55   push   %r13
 22c:   41 54   push   %r12
 22e:   55  push   %rbp
 22f:   53  push   %rbx
 230:   48 89 fbmov%rdi,%rbx
 233:   41 89 f5mov%esi,%r13d
 236:   48 89 d5mov%rdx,%rbp
 239:   4c 8b 25 00 00 00 00mov0x0(%rip),%r12# 240

 240:   45 8b 34 24 mov(%r12),%r14d
 244:   48 89 d7mov%rdx,%rdi
 247:   e8 00 00 00 00  callq  24c 
 24c:   48 89 dfmov%rbx,%rdi
 24f:   e8 00 00 00 00  callq  254 
 254:   bf 01 00 00 00  mov$0x1,%edi
 259:   e8 00 00 00 00  callq  25e 
 25e:   45 89 34 24 mov%r14d,(%r12)
 262:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 269

 269:   48 89 28mov%rbp,(%rax)
 26c:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 273

 273:   48 89 18mov%rbx,(%rax)
 276:   48 89 dfmov%rbx,%rdi
 279:   e8 00 00 00 00  callq  27e 
 27e:   

[Bug target/53362] gcc 4.7 generates invalid code with -O3 and -mtune=bdver2

2012-05-15 Thread valerio at aimale dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362

--- Comment #2 from Valerio Aimale  2012-05-15 
18:07:01 UTC ---
Andrew,

thank you for your email. I'll extract some code from the R code base 
and generate a test case.

Valerio

On 5/15/12 11:43 AM, pinskia at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362
>
> Andrew Pinski  changed:
>
> What|Removed |Added
> 
>   Status|UNCONFIRMED |WAITING
> Last reconfirmed||2012-05-15
>Component|c   |target
>   Ever Confirmed|0   |1
> Severity|major   |normal
>
> --- Comment #1 from Andrew Pinski  2012-05-15 
> 17:43:29 UTC ---
> Can you attach a testcase that can compile and run?
>


[Bug target/53362] gcc 4.7 generates invalid code with -O3 and -mtune=bdver2

2012-05-15 Thread valerio at aimale dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362

--- Comment #3 from Valerio Aimale  2012-05-15 
22:13:47 UTC ---
First of all, I made a mistake. The FX-8150 (which is family 14h) requires
-march=bdver1 not bdver2. The SIGSEGV, however, happens even with bdver1

To reproduce, compile R with

CC=gcc-4.7\
CXX=g++-4.7   \
OBJC=gcc-4.7  \
FC=gfortran-4.7   \
F77=gfortran-4.7  \
CFLAGS="-g -O3 -march=bdver1"\
CXXFLAGS="-g -O3 -march=bdver1"  \
OBJCFLAGS="-g -O3 -march=bdver1" \
FCFLAGS="-g -O3 -march=bdver1"   \
FFLAGS="-g -O3 -march=bdver1"\
./configure   \
--enable-R-shlib  \
--enable-threads=posix\
--with-readline   \
--with-system-pcre\
--prefix=/usr/local/pkg/R-2.15.0-k15 \
--with-x  \
--with-system-zlib\
--with-cairo  \
--with-jpeglib\
--with-blas   \
--with-lapack \
--with-tcltk  \
--with-libpng

Second, the SIGSEGV actually happens inside eval.c at bcEval(). Here's a more
detailed description:

R has a "just in time" compiler that compiles R code to a virtual machine (a la
java like). The SIGSEGV, which  happens when optimizing with -O3 -march=bdver1,
happens in the JIT intepreter.

The JIT essential has a switch { case OPERAND 1: ; case OPERAND 2: ... } with a
program counter called pc

This snippet

---
BEGIN_MACHINE {
OP(BCMISMATCH, 0): error(_("byte code version mismatch"));
OP(RETURN, 0): value = GETSTACK(-1); goto done;
OP(GOTO, 1):
  {
int label = GETOP();
BC_CHECK_SIGINT();
pc = codebase + label;
NEXT();
  }

---

which, when preprocessed, translates to:

--
  (__extension__ ({goto *(*pc++).v;})); init: { loop: switch(which++) {
case BCMISMATCH_OP: opinfo[BCMISMATCH_OP].addr = (__extension__
&&op_BCMISMATCH); opinfo[BCMISMATCH_OP].argc = (0); goto loop; op_BCMISMATCH:
Rf_error(dcgettext (((void *)0), "byte code version mismatch", __LC_MESSAGES));
case RETURN_OP: opinfo[RETURN_OP].addr = (__extension__ &&op_RETURN);
opinfo[RETURN_OP].argc = (0); goto loop; op_RETURN: value = (*(R_BCNodeStackTop
+ (-1))); goto done;
case GOTO_OP: opinfo[GOTO_OP].addr = (__extension__ &&op_GOTO);
opinfo[GOTO_OP].argc = (1); goto loop; op_GOTO:
  {
 int label = (*pc++).i;
 do { if (++evalcount > 1000) { R_CheckUserInterrupt(); evalcount = 0; } }
while (0);
 pc = codebase + label;
 (__extension__ ({goto *(*pc++).v;}));
  }
case BRIFNOT_OP: opinfo[BRIFNOT_OP].addr = (__extension__ &&op_BRIFNOT);
opinfo[BRIFNOT_OP].argc = (2); goto loop; op_BRIFNOT:
  {
 int callidx = (*pc++).i;
 int label = (*pc++).i;
-

now the line

goto *(*pc++).v;

when compiled as -O3 -march=bdver1

translates to

   0x7786bb4e <+366>:lea0x38(%r15),%rbp
   0x7786bb52 <+370>:data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
   0x7786bb60 <+384>:jmpq   *%rax
   0x7786bb62 <+386>:nopw   0x0(%rax,%rax,1)

I believe that the goto becomes  jmpq   *%rax, with nopw before and after being
just fillers for 64bit alignment (not sure though I don't understand those
nopw)

When executing, the code had to run some bytecode; before executing
0x7786bb60 the return rip correctly contains 0x7787ad4d

(gdb) stepi
0x7786bb604033  BEGIN_MACHINE {
(gdb) info frame 0
Stack frame at 0x7ffeff20:
 rip = 0x7786bb60 in bcEval (eval.c:4033); saved rip 0x7787ad4d
 called by frame at 0x7fff0110
 source language c.
 Arglist at 0x7ffef978, args: body=body@entry=0x153ecb0,
rho=rho@entry=0x1540150, useCache=TRUE
 Locals at 0x7ffef978, Previous frame's sp is 0x7ffeff20
 Saved registers:
  rbx at 0x7ffefee8, rbp at 0x7ffefef0, r12 at 0x7ffefef8, r13 at
0x7ffeff00, r14 at 0x7ffeff08, r15 at 0x7ffeff10, rip at
0x7ffeff18
(gdb) info program
Using the running image of child Thread 0x77fde780 (LWP 25913).
Program stopped at 0x7786bb60.

once i execute 0x7786bb60

(gdb) stepi
bcEval (useCache=FALSE, rho=0x0, body=0x0) at eval.c:4217
4217OP(GETFUN, 1):
(gdb) info frame 0
Stack frame at 0x7ffefe90:
 rip = 0x77890f97 in bcEval (eval.c:4217); saved rip 0x7ffeff30
 called by frame at 0x7ffefe98
 source language c.
 Arglist at 0x7ffef978, args: useCache=FALSE, rho=0x0, body=0x0
 Locals at 0x7ffef978, Previous frame's sp is 0x7ffefe90
 Saved registers:
  rbx at 0x7ffefe58, rbp at 0x7ffefe60, r12 at 0x7ffefe68, r13 at
0x7ffefe70, r14 at 0x7ffefe78, r15 at 0x7ffefe80, rip at
0x7ffefe88


the return rip is 0x7ffeff30, which is outside the program virtual address
space and gives the SIGSEGV when the next retq is executed.

When, instead, I compile with "-O -march=bdver1"

that line, goto *(*pc++).v; , compile

[Bug target/53362] gcc 4.7 generates invalid code with -O3 and -mtune=bdver2

2012-05-15 Thread valerio at aimale dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362

--- Comment #4 from Valerio Aimale  2012-05-15 
22:15:19 UTC ---
On 5/15/12 11:43 AM, pinskia at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53362
>
> Andrew Pinski  changed:
>
> What|Removed |Added
> 
>   Status|UNCONFIRMED |WAITING
> Last reconfirmed||2012-05-15
>Component|c   |target
>   Ever Confirmed|0   |1
> Severity|major   |normal
>
> --- Comment #1 from Andrew Pinski  2012-05-15 
> 17:43:29 UTC ---
> Can you attach a testcase that can compile and run?
>
Andrew,

I have been unable to come up with a test case, but I dug up more 
information. R has a "just in time" compiler that compiles R code to a 
virtual machine (a la java like). The SIGSEGV, which  happens when 
optimizing with -O3 -march=bdver1, happens in the JIT intepreter.

The assembler code I pointed to in the original bug-report is not where 
the SIGSEGV happens.

Here's the code, I had to do some major digging with gdb to find the 
problem.

the JIT essential has a switch { case OPERAND 1: ; case OPERAND 2: ... } 
with a program counter called pc

This snippet

---
BEGIN_MACHINE {
 OP(BCMISMATCH, 0): error(_("byte code version mismatch"));
 OP(RETURN, 0): value = GETSTACK(-1); goto done;
 OP(GOTO, 1):
   {
 int label = GETOP();
 BC_CHECK_SIGINT();
 pc = codebase + label;
 NEXT();
   }

---

which, when preprocessed, translates to:

--
   (__extension__ ({goto *(*pc++).v;})); init: { loop: switch(which++) {
 case BCMISMATCH_OP: opinfo[BCMISMATCH_OP].addr = (__extension__ 
&&op_BCMISMATCH); opinfo[BCMISMATCH_OP].argc = (0); goto loop; 
op_BCMISMATCH: Rf_error(dcgettext (((void *)0), "byte code version 
mismatch", __LC_MESSAGES));
 case RETURN_OP: opinfo[RETURN_OP].addr = (__extension__ 
&&op_RETURN); opinfo[RETURN_OP].argc = (0); goto loop; op_RETURN: value 
= (*(R_BCNodeStackTop + (-1))); goto done;
 case GOTO_OP: opinfo[GOTO_OP].addr = (__extension__ &&op_GOTO); 
opinfo[GOTO_OP].argc = (1); goto loop; op_GOTO:
   {
  int label = (*pc++).i;
  do { if (++evalcount > 1000) { R_CheckUserInterrupt(); evalcount = 0; 
} } while (0);
  pc = codebase + label;
  (__extension__ ({goto *(*pc++).v;}));
   }
 case BRIFNOT_OP: opinfo[BRIFNOT_OP].addr = (__extension__ 
&&op_BRIFNOT); opinfo[BRIFNOT_OP].argc = (2); goto loop; op_BRIFNOT:
   {
  int callidx = (*pc++).i;
  int label = (*pc++).i;
-

now the line

goto *(*pc++).v;

when compiled as -O3 -march=bdver1

translates to

0x7786bb4e <+366>:lea0x38(%r15),%rbp
0x7786bb52 <+370>:data32 data32 data32 data32 nopw 
%cs:0x0(%rax,%rax,1)
0x7786bb60 <+384>:jmpq   *%rax
0x7786bb62 <+386>:nopw   0x0(%rax,%rax,1)

I believe that the goto becomes  jmpq   *%rax, with nopw before and 
after being just fillers for 64bit alignment (not sure though I don't 
understand those nopw)

When executing, the code had to run some bytecode; before executing 
0x7786bb60 the return rip correctly contains 0x7787ad4d

(gdb) stepi
0x7786bb604033  BEGIN_MACHINE {
(gdb) info frame 0
Stack frame at 0x7ffeff20:
  rip = 0x7786bb60 in bcEval (eval.c:4033); saved rip 0x7787ad4d
  called by frame at 0x7fff0110
  source language c.
  Arglist at 0x7ffef978, args: body=body@entry=0x153ecb0, 
rho=rho@entry=0x1540150, useCache=TRUE
  Locals at 0x7ffef978, Previous frame's sp is 0x7ffeff20
  Saved registers:
   rbx at 0x7ffefee8, rbp at 0x7ffefef0, r12 at 0x7ffefef8, 
r13 at 0x7ffeff00, r14 at 0x7ffeff08, r15 at 0x7ffeff10, rip 
at 0x7ffeff18
(gdb) info program
 Using the running image of child Thread 0x77fde780 (LWP 25913).
Program stopped at 0x7786bb60.

once i execute 0x7786bb60

(gdb) stepi
bcEval (useCache=FALSE, rho=0x0, body=0x0) at eval.c:4217
4217OP(GETFUN, 1):
(gdb) info frame 0
Stack frame at 0x7ffefe90:
  rip = 0x77890f97 in bcEval (eval.c:4217); saved rip 0x7ffeff30
  called by frame at 0x7ffefe98
  source language c.
  Arglist at 0x7ffef978, args: useCache=FALSE, rho=0x0, body=0x0
  Locals at 0x7ffef978, Previous frame's sp is 0x7ffefe90
  Saved registers:
   rbx at 0x7ffefe58, rbp at 0x7ffefe60, r12 at 0x7ffefe68, 
r13 at 0x7ffefe70, r14 at 0x7ffefe78, r15 at 0x7ffefe80, rip 
at 0x7ffefe88


the return rip is 0x7ffeff30, which is outside the program virtual 
address space and gives the SIGSEGV when the next retq is executed.

When, instead, I compile with "-O -march=bdver1"

that line, goto *(*pc++).v; , compiles to

 209d:   48 83 c3