http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47271

           Summary: gcc-4.6 -O1 -ftree-vectorize removes a test (if), the
                    function generates invalid outputs
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: critical
          Priority: P3
         Component: c
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: victor.stin...@haypocalc.com


I tried to compile Python 3.2 (r87949) with gcc (version 4.6.0 20100908) on
AMD64: Python does fail with an assertion error or another strange crash. The
problem comes from a loop in Python/peephole.c. Compiled with -O1, it works
fine. Compiled with -O1 -ftree-vectorize, the functions generates strange
(invalid) outputs.

gcc-4.6 -O1:
-------------------------------------
0x0000000000480991 <+5041>:  mov    %eax,%edx
0x0000000000480993 <+5043>:  sub    %esi,%edx
0x0000000000480995 <+5045>:  mov    %edx,(%r12,%rax,4)
0x0000000000480999 <+5049>:  movzbl 0x0(%rbp,%rax,1),%edx
0x000000000048099e <+5054>:  cmp    $0x9,%dl
0x00000000004809a1 <+5057>:  jne    0x4809a8 <PyCode_Optimize+5064>
0x00000000004809a3 <+5059>:  add    $0x1,%esi
0x00000000004809a6 <+5062>:  jmp    0x4809b2 <PyCode_Optimize+5074>
0x00000000004809a8 <+5064>:  mov    $0x3,%ecx
0x00000000004809ad <+5069>:  cmp    $0x59,%dl
0x00000000004809b0 <+5072>:  ja     0x4809b7 <PyCode_Optimize+5079>
0x00000000004809b2 <+5074>:  mov    $0x1,%ecx
0x00000000004809b7 <+5079>:  add    %rcx,%rax
0x00000000004809ba <+5082>:  cmp    %rax,%rdi
0x00000000004809bd <+5085>:  jg     0x480991 <PyCode_Optimize+5041>
-------------------------------------

gcc-4.6 -O1 -ftree-vectorize
-------------------------------------
0x0000000000480991 <+5041>:  mov    %eax,%ecx
0x0000000000480993 <+5043>:  sub    %edx,%ecx
0x0000000000480995 <+5045>:  mov    %ecx,(%r12,%rax,4)
0x0000000000480999 <+5049>:  movzbl 0x0(%rbp,%rax,1),%ecx
0x000000000048099e <+5054>:  lea    0x1(%rdx),%esi
0x00000000004809a1 <+5057>:  cmp    $0x9,%cl
0x00000000004809a4 <+5060>:  cmovne %edx,%esi
0x00000000004809a7 <+5063>:  cmove  %esi,%edx
0x00000000004809aa <+5066>:  setne  %cl
0x00000000004809ad <+5069>:  movzbl %cl,%ecx
0x00000000004809b0 <+5072>:  lea    0x1(%rax,%rcx,2),%rax
0x00000000004809b5 <+5077>:  cmp    %rax,%rdi
0x00000000004809b8 <+5080>:  jg     0x480991 <PyCode_Optimize+5041>
-------------------------------------

Extract of the correct output (-O1):
----
addrmap[0]=0
addrmap[3]=3
addrmap[4]=4
addrmap[7]=7
addrmap[10]=10
addrmap[13]=13
addrmap[16]=16
addrmap[19]=19
addrmap[22]=22
addrmap[23]=22
----

With -O1 -ftree-vectorize, only addrmap[0] and addrmap[3] are correct:
----
addrmap[0]=0
addrmap[3]=3
addrmap[4]=0
addrmap[7]=32767
addrmap[10]=16777216
addrmap[13]=0
addrmap[16]=469314288
addrmap[19]=32767
addrmap[22]=469315151
addrmap[23]=32767
----

See also:
http://bugs.python.org/issue9880

My setup:
 * Intel(R) Pentium(R) 4 CPU 3.00GHz
 * Debian Sid
 * gcc (Debian 20110106-1) 4.6.0 20110106 (experimental) [trunk revision
168538] 
 * Python 3.2 (r87949)

Reply via email to