[Bug target/36539] IRA+i386 doesn't allocate asm output being returned to eax

2008-12-05 Thread astrange at ithinksw dot com


--- Comment #8 from astrange at ithinksw dot com  2008-12-05 20:08 ---
With some recent changes IRA makes better decisions now but they don't survive
reload.

Using
> /gcc -O3 -fomit-frame-pointer -fno-pic -fdump-rtl-ira -S cabac-ret.i

I get about the same asm and this in the IRA dump:
 Allocnos coloring:


  Loop 0 (parent -1, header bb0, depth 0)
bbs: 2
all: 0r64 1r58 2r62 3r59 4r60 5r63
modified regnos: 58 59 60 62 63 64
border:
Pressure: GENERAL_REGS=6
Reg 58 of GENERAL_REGS has 2 regs less
Reg 62 of GENERAL_REGS has 2 regs less
Reg 59 of GENERAL_REGS has 2 regs less
Reg 60 of GENERAL_REGS has 2 regs less
Reg 63 of GENERAL_REGS has 2 regs less
  Pushing a0(r64,l0)
  Pushing a3(r59,l0)(potential spill: pri=2857, cost=2)
  Pushing a1(r58,l0)
  Pushing a5(r63,l0)
  Pushing a2(r62,l0)
  Pushing a4(r60,l0)
  Popping a4(r60,l0)  -- assign reg 3
  Popping a2(r62,l0)  -- assign reg 4
  Popping a5(r63,l0)  -- assign reg 0 <- "r"(state)
  Popping a1(r58,l0)  -- assign reg 0 <- "=&r"(bit)
  Popping a3(r59,l0)  -- assign reg 5
  Popping a0(r64,l0)  -- assign reg 0 <- returned bit&1

a1 and a5 should be conflicting, since a1 is an earlyclobber output and can't
share a register with any of the inputs. reload fixes this by moving it to a
worse register. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug target/32593] Missed optimization of 'y = constant - x' operation

2008-12-17 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2008-12-17 22:10 ---
Causes silly code on i386 with this:
void pred8x8l_vertical_add_c(unsigned char *pix, const short *block, int
stride){
int i;
for(i=0; i<8; i++){
int j;
for (j=0; j<8; j++){
pix[j] = pix[j-stride] + block[j];
}
pix+= stride;
block+= 8;
}
}

where it calculates and then spills each of [0-7] - stride to the stack,
instead of just being able to keep -stride in a register and incrementing it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32593



[Bug c/2803] casts in asm act as lvalues

2009-05-25 Thread astrange at ithinksw dot com


--- Comment #12 from astrange at ithinksw dot com  2009-05-25 20:26 ---
I noticed this is still accepted by gcc 4.5; one stuck into ffmpeg and broke
the build with another compiler.

For instance, this only fails in c():

int as(int a)
{
asm ("" : : "m"((int)a));
}

int c(int a)
{
return *&((int)a);
}

> /usr/local/gcc45/bin/gcc -S test.c
test.c: In function 'c':
test.c:8: error: lvalue required as unary '&' operand


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added
----
         CC|        |astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=2803



[Bug tree-optimization/36318] SRA pessimizes struct copies without -Os

2009-05-29 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2009-05-30 00:19 ---
Fixed with new SRA:
_foo1:
subl$12, %esp
movl20(%esp), %eax
movl(%eax), %edx
movl16(%esp), %eax
movl%edx, (%eax)
addl$12, %esp
ret


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36318



[Bug tree-optimization/36318] SRA pessimizes struct copies without -Os

2009-06-04 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2009-06-05 04:31 ---
This bug must have been weaker than I remembered it; when I used 4 char fields
instead of one char[4], 4.4 behaved properly too.

How about:
Alexander Strange http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36318



[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86

2009-08-06 Thread astrange at ithinksw dot com


--- Comment #5 from astrange at ithinksw dot com  2009-08-07 03:04 ---
Fixed with -O3 -fgraphite-identity. Why did I even bother checking that?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127



[Bug tree-optimization/40992] New: [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size

2009-08-06 Thread astrange at ithinksw dot com
The attached file is a loop over the same function implemented in C and inline
asm. 

When compiled with:
gcc -O3 -fno-pic -fomit-frame-pointer -fdump-tree-cunroll-details -S
cabac_unroll.i
cunroll thinks they're different sizes:

size: 55-4, last_iteration: 55-4
  Loop size: 55
  Estimated size after unrolling: 442

size: 8-4, last_iteration: 8-4
  Loop size: 8
  Estimated size after unrolling: 34

and expands the asm loop all 13 times.

This is reduced from ffmpeg decode_cabac_residual, where it apparently causes
significant decoding slowdown.

Besides that, cunroll seems to be hurting ffmpeg in general on x86-32
(http://multimedia.cx/eggs/last-performance-smackdown-for-awhile/), maybe we'll
turn it down some.


-- 
   Summary: [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992



[Bug tree-optimization/40992] [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size

2009-08-06 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2009-08-07 04:25 ---
Created an attachment (id=18315)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18315&action=view)
the source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992



[Bug tree-optimization/40992] [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size

2009-08-08 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2009-08-08 16:44 ---
Maybe the C version will be usable after everyone is using 4.4+, earlier
versions tend to make a mess.

Anyway, counting newlines for size estimation wouldn't pessimize anything.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992



[Bug tree-optimization/32572] Small C++ function fails to inline with large negative badness

2007-07-16 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2007-07-16 19:51 ---
Seems to work now in r126689.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32572



[Bug tree-optimization/32572] Small C++ function fails to inline with large negative badness

2007-09-15 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2007-09-15 23:47 ---
It's more like it was accidentally fixed and the underlying cause is still
there, but it is fixed.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32572



[Bug target/33555] New: x86 missed opportunity for sbb

2007-09-25 Thread astrange at ithinksw dot com
> /usr/local/gcc43/bin/gcc -v
Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: ../gcc/configure --prefix=/usr/local/gcc43 --with-arch=nocona
--with-tune=nocona --with-gmp=/sw --with-system-zlib
--enable-languages=c,c++,objc,obj-c++
Thread model: posix
gcc version 4.3.0 20070925 (experimental) (GCC) 

> /usr/local/gcc43/bin/gcc -Os -fno-pic -S sbb.c -fomit-frame-pointer
.text
.globl _cmpb_sbb
_cmpb_sbb:
subl$12, %esp
movl16(%esp), %eax
movl20(%esp), %ecx
xorl%edx, %edx
cmpl24(%esp), %ecx
setb   %dl
negl%edx
andl%ecx, %edx
subl%edx, %eax
addl$12, %esp
ret
.subsections_via_symbols

Source:
unsigned cmpb_sbb(unsigned a, unsigned b, unsigned c)
{
unsigned mask = -(b < c);

a -= b & mask;
return a;
}

"setb, negl" is the same as 0 - (0 + eflags.CF), so it can be replaced with
"sbb %edx, %edx".
This is useful for if-conversion, since it's the same as "if (b < c) a -= b;"


-- 
   Summary: x86 missed opportunity for sbb
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i386-apple-darwin8.10.1
  GCC host triplet: i386-apple-darwin8.10.1
GCC target triplet: i386-apple-darwin8.10.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33555



[Bug tree-optimization/33705] New: gcc generates dead struct stores

2007-10-08 Thread astrange at ithinksw dot com
> /usr/local/gcc43/bin/gcc -O3 -fno-pic -fomit-frame-pointer -m64 -S 
> gcc-struct-stores.i -v
Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: ../gcc/configure --prefix=/usr/local/gcc43 --with-arch=nocona
--with-tune=nocona --with-gmp=/sw --with-system-zlib
--enable-languages=c,c++,objc,obj-c++ --disable-bootstrap
Thread model: posix
gcc version 4.3.0 20071008 (experimental) (GCC) 

GCC updates c->low and c->range in the middle of the function:
movl%r8d, (%rdi)
movl%edx, 4(%rdi)

but they're overwritten at the end:
movl%edx, 4(%rdi)
sall%cl, (%rdi)

I don't know if there are aliasing issues, but marking it __restrict doesn't
affect it.


-- 
   Summary: gcc generates dead struct stores
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i386-apple-darwin8.10.1
  GCC host triplet: i386-apple-darwin8.10.1
GCC target triplet: i386-apple-darwin8.10.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33705



[Bug tree-optimization/33705] gcc generates dead struct stores

2007-10-08 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2007-10-09 03:14 ---
Created an attachment (id=14328)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14328&action=view)
source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33705



[Bug tree-optimization/33705] gcc generates dead struct stores

2007-10-08 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2007-10-09 03:15 ---
Created an attachment (id=14329)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14329&action=view)
resulting x86-64 asm

/usr/local/gcc43/bin/gcc -O3 -fno-pic -fomit-frame-pointer -m64 -S
gcc-struct-stores.i


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33705



[Bug target/33791] New: x86 out of registers ICE with -fschedule-insns -march=core2

2007-10-15 Thread astrange at ithinksw dot com
> /usr/local/gcc43/bin/gcc -vUsing built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: ../gcc/configure --prefix=/usr/local/gcc43 --with-arch=nocona
--with-tune=nocona --with-gmp=/sw --with-system-zlib
--enable-languages=c,c++,objc,obj-c++
Thread model: posix
gcc version 4.3.0 20071015 (experimental) (GCC) 

> /usr/local/gcc43/bin/gcc -O1 -fschedule-insns -march=core2 -S 
> gcc-sched-ice-32.i
gcc-sched-ice-32.i: In function 'decode_init':
gcc-sched-ice-32.i:177: warning: assignment from incompatible pointer type
gcc-sched-ice-32.i: In function 'decode_nal_units':
gcc-sched-ice-32.i:332: warning: assignment from incompatible pointer type
gcc-sched-ice-32.i: In function 'hl_decode_mb_internal':
gcc-sched-ice-32.i:275: error: unable to find a register to spill in class
'GENERAL_REGS'
gcc-sched-ice-32.i:275: error: this is the insn:
(insn 222 221 232 26 gcc-sched-ice-32.i:183 (set (mem:DI (plus:SI (reg:SI 170)
(reg/f:SI 169 [ .top_borders ])) [0 S8 A64])
(reg:DI 172)) 88 {*movdi_2} (expr_list:REG_DEAD (reg:DI 172)
(expr_list:REG_DEAD (reg:SI 170)
(expr_list:REG_DEAD (reg/f:SI 169 [ .top_borders ])
(nil)
gcc-sched-ice-32.i:275: internal compiler error: in spill_failure, at
reload1.c:2001
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

Delta-reduced so warnings don't mean anything.
The original (large) source has variants on the same error (different insns)
with and without -m64/no-pic/omit-frame-pointer.


-- 
   Summary: x86 out of registers ICE with -fschedule-insns -
march=core2
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i386-apple-darwin8.10.1
  GCC host triplet: i386-apple-darwin8.10.1
GCC target triplet: i386-apple-darwin8.10.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33791



[Bug target/33791] x86 out of registers ICE with -fschedule-insns -march=core2

2007-10-15 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2007-10-16 05:46 ---
Created an attachment (id=14358)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14358&action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33791



[Bug tree-optimization/20231] New: missed optimization of loop IV modulus

2005-02-26 Thread astrange at ithinksw dot com
[zebes:~] astrange% /usr/local/bin/gcc -v
Using built-in specs.
Target: powerpc-apple-darwin7.7.0
Configured with: ../configure --enable-threads=posix --with-threads=posix
Thread model: posix
gcc version 4.1.0 20050226 (experimental)

Command line: /usr/local/bin/gcc -O3 -mcpu=7400 -mtune=7400 
-fdump-tree-optimized -c 
mod_loop.c

Code:
void mod_loop(unsigned char *array, int len, unsigned char repeat)
{
unsigned char i;
for (i = 0; i < len; i++) array[i] = i%repeat;
}

void mod_loop2(unsigned char *array, int len, unsigned char repeat)
{
unsigned char i,i2=0;
for (i = 0; i < len; i++) {array[i] = i2++; if (i2 == repeat) i2 = 0;}
}

Although the two functions are equivalent and mod_loop2 is better (avoiding an 
expensive divide), GCC 
doesn't transform the first into the second.

-- 
   Summary: missed optimization of loop IV modulus
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P2
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: powerpc-apple-darwin7.7.0
  GCC host triplet: powerpc-apple-darwin7.7.0
GCC target triplet: powerpc-apple-darwin7.7.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20231


[Bug rtl-optimization/20614] New: PowerPC - inefficient use of condition register

2005-03-23 Thread astrange at ithinksw dot com
[zebes:~] astrange% /usr/local/bin/g++fsf -v
Using built-in specs.
Target: powerpc-apple-darwin7.7.0
Configured with: ../fsfgcc/configure --program-suffix=fsf 
--enable-languages=c,c++,java,treelang --
enable-cpu=750
Thread model: posix
gcc version 4.1.0 20050321 (experimental)

[zebes:~] astrange% /usr/local/bin/g++fsf -fdump-tree-gimple 
-fdump-tree-optimized -Os 
-mmultiple -S -mcpu=750 -mtune=750 -dp ccset.cpp

The source is
unsigned int ccblah(unsigned int *q,unsigned int B,unsigned int D,unsigned int 
E,unsigned int 
F,unsigned int H)
{
bool cfold1, cfold2, cfold3, cfold4, cfold5, cfold6;
cfold1 = D == B;
cfold2 = B == F;
cfold3 = D == H;
cfold4 = F == H;
cfold5 = !cfold1 && !cfold4;
cfold6 = !cfold3 && !cfold2;
q[0] = cfold1 && cfold6 ? D : E;
q[1] = cfold2 && cfold5 ? F : E;
q[2] = cfold3 && cfold5 ? D : E;
q[3] = cfold4 && cfold6 ? F : E;
return ((cfold1) ^ (cfold2))?E:F;
}

Except for the return, it's a simplified testcase of 
http://scale2x.sourceforge.net/'s core.

GCC generates this for the first block:
__Z7ccblahPjj:
LFB3:
xor r12,r5,r4   ; 30*rs6000.md:11675/1  [length = 12]
subfic r0,r12,0
adde. r12,r0,r12
xor r10,r7,r8   ; 27*rs6000.md:11645/1  [length = 12]
subfic r0,r10,0
adde r10,r0,r10
xor r4,r4,r7; 19*rs6000.md:11645/1  [length = 12]
subfic r0,r4,0
adde r4,r0,r4
xor r11,r5,r8   ; 23*rs6000.md:11645/1  [length = 12]
subfic r0,r11,0
adde r11,r0,r11
beq- cr0,L27; 31*rs6000.md:13791[length = 4]
li r2,0 ; 34*movsi_internal1/5  [length = 4]
b L29   ; 151   jump[length = 4]

The xor/subfic/adde patterns are unnecessary; instead, it should be using cmpw 
into four different CR 
subregisters ("cmpw cr1,r5,r4" "cmpw cr2,r7,r8" etc).

For the return statement, GCC generates:
L47:
cmpw cr7,r30,r4 ; 127   *cmpsi_internal1[length = 4]
stw r0,12(r11)  ; 125   *movsi_internal1/4  [length = 4]
bne+ cr7,L48; 128   *rs6000.md:13791[length = 4]
mr r3,r7; 131   *movsi_internal1/1  [length = 4]
L48:
lmw r30,-8(r1)  ; 169   *lmw[length = 4]
blr ; 170   *return_internal_si [length = 4]

There may be a possible savings by using crxor instead of another cmpw.

-- 
   Summary: PowerPC - inefficient use of condition register
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P2
 Component: rtl-optimization
    AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: powerpc-apple-darwin7.7.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20614


[Bug target/20614] PowerPC - inefficient use of condition register

2005-03-23 Thread astrange at ithinksw dot com

--- Additional Comments From astrange at ithinksw dot com  2005-03-24 06:39 
---
It buys two cycles per compare on a G3/G4 (as well as not clobbering cr0, which 
one of the gcc patterns 
does). It also saves 2/3s of the code size, which is what -Os is targeting. Not 
much, but the realworld 
version of this will take all the speed it can get.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20614


[Bug rtl-optimization/36673] IRA -O3 -fno-pic ICE in save_con_fun_n, at caller-save.c:1389

2008-08-26 Thread astrange at ithinksw dot com


--- Comment #5 from astrange at ithinksw dot com  2008-08-27 04:27 ---
Fixed.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36673



[Bug rtl-optimization/36672] IRA + -fno-pic ICE in emit_swap_insn, at reg-stack.c:829

2008-08-26 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2008-08-27 04:28 ---
Fixed.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36672



[Bug rtl-optimization/36663] IRA ICE in save_call_clobbered_regs at caller-save.c:1949

2008-08-26 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2008-08-27 04:28 ---
Fixed.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36663



[Bug target/36539] [4.4 regression] IRA doesn't allocate asm output being returned to eax

2008-08-26 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2008-08-27 04:41 ---
Now it is.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

Summary|IRA doesn't allocate asm|[4.4 regression] IRA doesn't
   |output being returned to eax|allocate asm output being
   ||returned to eax


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug target/36539] [4.4 regression] IRA doesn't allocate asm output being returned to eax

2008-09-03 Thread astrange at ithinksw dot com


--- Comment #5 from astrange at ithinksw dot com  2008-09-04 04:02 ---
It is fixed for me on x86-64. For i386 it's still suboptimal:
_get_cabac:
subl$28, %esp
movl%esi, 16(%esp)
movl%edi, 20(%esp)
movl%ebx, 12(%esp)
movl%ebp, 24(%esp)
movl32(%esp), %esi
movl36(%esp), %edi
movl(%esi), %eax
movl4(%esi), %ebx
# 16 "../cabac-ret.i" 1
#%ebp %ebx %ax 16(%esi) %edi
# 0 "" 2
movl%eax, (%esi)
movl%ebx, 4(%esi)
movl%ebp, %eax
movl12(%esp), %ebx
andl$1, %eax
movl16(%esp), %esi
movl20(%esp), %edi
movl24(%esp), %ebp
addl$28, %esp
ret

but not a regression (code is worse without IRA).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug target/36539] IRA+i386 doesn't allocate asm output being returned to eax

2008-09-17 Thread astrange at ithinksw dot com


--- Comment #7 from astrange at ithinksw dot com  2008-09-18 01:29 ---
Updated to 32-bit only.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

   Severity|normal  |enhancement
 GCC target triplet|x86_64-*-*  |i?86-*-*
Summary|IRA doesn't allocate asm|IRA+i386 doesn't allocate
   |output being returned to eax|asm output being returned to
   ||eax


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug rtl-optimization/34279] ira branch/-fdump-rtl-ira ICE at at ira-call.c:525

2008-02-10 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2008-02-10 19:14 ---
fixed


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34279



[Bug tree-optimization/35215] New: ICE: verify_histograms failed with -fprofile-use

2008-02-15 Thread astrange at ithinksw dot com
> /usr/local/gcc43/bin/gcc -v
Using built-in specs.
Target: i386-apple-darwin9.2.0
Configured with: ../gcc/configure --prefix=/usr/local/gcc43
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl
CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc,obj-c++
Thread model: posix
gcc version 4.3.0 20080215 (experimental) (GCC) 

> /usr/local/gcc43/bin/gcc -O3 -fprofile-use -c pcx.i 
pcx.c: In function 'pcx_decode_frame':
pcx.c:247: error: Dead histogram
IOR value ior:0.
memset (dst_28, 0, 0);

pcx.c:247: internal compiler error: verify_histograms failed
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

I compiled ffmpeg profiled and didn't exercise code before doing -fprofile-use,
so the stats are all empty.


-- 
   Summary: ICE: verify_histograms failed with -fprofile-use
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: astrange at ithinksw dot com
 GCC build triplet: 386-apple-darwin9.2.0
  GCC host triplet: 386-apple-darwin9.2.0
GCC target triplet: 386-apple-darwin9.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35215



[Bug tree-optimization/35215] ICE: verify_histograms failed with -fprofile-use

2008-02-15 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-02-16 04:56 ---
Created an attachment (id=15164)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15164&action=view)
source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35215



[Bug tree-optimization/35215] ICE: verify_histograms failed with -fprofile-use

2008-02-15 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2008-02-16 04:56 ---
Created an attachment (id=15165)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15165&action=view)
gcda


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35215



[Bug tree-optimization/35215] ICE: verify_histograms failed with -fprofile-use

2008-02-15 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2008-02-16 04:57 ---
Created an attachment (id=15166)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15166&action=view)
gcno


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35215



[Bug target/32593] Missed optimization of 'y = constant - x' operation

2008-02-16 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2008-02-16 18:16 ---
Also, 'x >> 32 - y' can be transformed into 'x >> -y', since x86 only uses the
lowest 5 bits. I'm not sure about other targets.

Messing with the backend doesn't seem very popular these days. I guess I should
figure out how those parts work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32593



[Bug target/32593] Missed optimization of 'y = constant - x' operation

2008-02-16 Thread astrange at ithinksw dot com


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

   Severity|normal  |enhancement


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32593



[Bug target/33555] x86 missed opportunity for sbb

2008-02-16 Thread astrange at ithinksw dot com


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

   Severity|minor   |enhancement


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33555



[Bug rtl-optimization/35281] New: [4.3 regression] multiply with 0 generated for 64*32->64

2008-02-21 Thread astrange at ithinksw dot com
> /usr/local/gcc43/bin/gcc -v   
Using built-in specs.
Target: i386-apple-darwin9.2.0
Configured with: ../gcc/configure --prefix=/usr/local/gcc43
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl
CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc,obj-c++
Thread model: posix
gcc version 4.4.0 20080219 (experimental) (GCC) 
> /usr/local/gcc43/bin/gcc -Os -fno-PIC -S u64mul.c -fomit-frame-pointer

gcc generates:
_mul32:
pushl   %ebx
xorl%edx, %edx
subl$8, %esp
movl_b, %eax
movl_a, %ecx
movl_a+4, %ebx
imull   %edx, %ecx
imull   %eax, %ebx
mull_a
addl%ebx, %ecx
leal(%ecx,%edx), %edx
popl%ecx
popl%ebx
popl%ebx
ret

and somehow leaves all the stuff with edx in.
4.1.3 generates:
mul32:
movl_b, %eax
movl_a+4, %ecx
imull   %eax, %ecx
mull_a
addl%edx, %ecx
movl%ecx, %edx
ret

They both generate bad code for mul16.


-- 
   Summary: [4.3 regression] multiply with 0 generated for 64*32->64
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i386-apple-darwin9.2.0
  GCC host triplet: i386-apple-darwin9.2.0
GCC target triplet: i386-apple-darwin9.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35281



[Bug rtl-optimization/35281] [4.3 regression] multiply with 0 generated for 64*32->64

2008-02-21 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2008-02-21 21:58 ---
Created an attachment (id=15199)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15199&action=view)
source

Oh, I forgot.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35281



[Bug target/14552] compiled trivial vector intrinsic code is inefficient

2008-03-19 Thread astrange at ithinksw dot com


--- Comment #24 from astrange at ithinksw dot com  2008-03-19 19:21 ---
For
typedef short mmxw  __attribute__ ((mode(V4HI)));
typedef int   mmxdw __attribute__ ((mode(V2SI)));

mmxdw dw;
mmxw w;

void test(){
w+=w;
dw= (mmxdw)w;
}

void test2(){
w= __builtin_ia32_paddw(w,w);
dw= (mmxdw)w;
}

gcc SVN generates the expected code for test2(), but not test(). I don't think
using += on an MMX variable should count as autovectorization - if you're doing
either you should know where to put emms yourself.

For test() we get:
subl$28, %esp
movq_w, %mm0
movq%mm0, 8(%esp)
movzwl  8(%esp), %eax
movzwl  10(%esp), %edx
movzwl  12(%esp), %ecx
addl%eax, %eax
addl%edx, %edx
movw%ax, _w
movw%dx, _w+2
movzwl  14(%esp), %eax
addl%ecx, %ecx
addl%eax, %eax
movw%cx, _w+4
movw%ax, _w+6
movq_w, %mm0
movq%mm0, _dw
addl$28, %esp
ret

which touches mm0 (requiring emms, I think) but not using paddw (so being slow
and silly-looking).
LLVM generates expected code for both of them.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552



[Bug target/14552] compiled trivial vector intrinsic code is inefficient

2008-03-19 Thread astrange at ithinksw dot com


--- Comment #25 from astrange at ithinksw dot com  2008-03-19 19:39 ---
Actually the first generates-
subl$12, %esp
movq_w, %mm0
paddw   %mm0, %mm0
movq%mm0, _w
movq_w, %mm0
movq%mm0, _dw
addl$12, %esp
ret

which is better than the code in the original report but still has a useless
store/reload.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552



[Bug target/14552] compiled trivial vector intrinsic code is inefficient

2008-03-19 Thread astrange at ithinksw dot com


--- Comment #32 from astrange at ithinksw dot com  2008-03-20 00:39 ---
This is missed on trees:
mmxdw dw;
mmxw w;

void test2(){
w= __builtin_ia32_paddw(w,w); w= (mmxdw)w;
}

void test3(){
mmxw w2= __builtin_ia32_paddw(w,w); dw= (mmxdw)w2;
}

test2 ()
{
  vector short int w.4;
  vector short int w.3;

:
  w.3 = w;
  w.4 = __builtin_ia32_paddw (w.3, w.3);
  w = w.4;
  dw = VIEW_CONVERT_EXPR(w);
  return;
}

test3 ()
{
  mmxw w2;
  vector short int w.6;

:
  w.6 = w;
  w2 = __builtin_ia32_paddw (w.6, w.6);
  dw = VIEW_CONVERT_EXPR(w2);
  return;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552



[Bug other/31043] duplicated data in .rodata / .rodata.cst sections.

2008-03-21 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-03-22 04:28 ---
I encountered this myself with 4.4.0 20080321.

If the data is static, gcc generates LC0 but not the copy with the original
name, which impedes debugging.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31043



[Bug target/35714] New: x86 poor code with pmaddwd

2008-03-26 Thread astrange at ithinksw dot com
> /usr/local/gcc44/bin/gcc -v
Using built-in specs.
Target: i386-apple-darwin9.2.0
Configured with: ../gcc/configure --prefix=/usr/local/gcc44
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl
CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc
Thread model: posix
gcc version 4.4.0 20080326 (experimental) (GCC)
> /usr/local/gcc44/bin/gcc -Os -march=core2 -fno-pic -fomit-frame-pointer 
> -flax-vector-conversions -S pmaddwd.c

generates:
_madd_swapped:
subl$12, %esp
movaps  LC0, %xmm1
addl$12, %esp
pmaddwd %xmm1, %xmm0
ret
.globl _madd
_madd:
subl$12, %esp
movaps  LC0, %xmm1
addl$12, %esp
pmaddwd %xmm0, %xmm1
movaps  %xmm1, %xmm0
ret

Both of these should be:
_madd:
pmaddwd LC0, %xmm0
ret

since the stack isn't referenced and pmaddwd is commutative. (the variable
being renamed LC0 is PR 31043)


-- 
   Summary: x86 poor code with pmaddwd
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i386-apple-darwin9.2.0
  GCC host triplet: i386-apple-darwin9.2.0
GCC target triplet: i386-apple-darwin9.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35714



[Bug target/35714] x86 poor code with pmaddwd

2008-03-26 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-03-27 01:02 ---
Created an attachment (id=15384)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15384&action=view)
source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35714



[Bug target/39123] New: x86 asm *(a+b) input causes out of registers above -O0

2009-02-06 Thread astrange at ithinksw dot com
Using gcc version 4.4.0 20090207 (experimental) (GCC) 

> /usr/local/gcc44/bin/gcc -O0 -fno-pic -fomit-frame-pointer -S cabac-ret.i
> /usr/local/gcc44/bin/gcc -O1 -fno-pic -fomit-frame-pointer -S cabac-ret.i
cabac-ret.i: In function 'get_cabac_minput':
cabac-ret.i:24: error: can't find a register in class 'GENERAL_REGS' while
reloading 'asm'
cabac-ret.i:24: error: 'asm' operand has impossible constraints

This is an asm using 7 registers; above -O0 one of the inputs in the second
version is combined into a complex memory operand, which uses 8 registers in
one statement and fails to compile. It would be nice if it could fall back to a
seperate add for x86-32, since the memory clobber in the first version might
cause suboptimal code.


-- 
   Summary: x86 asm *(a+b) input causes out of registers above -O0
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i?86-*-*
  GCC host triplet: i?86-*-*
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39123



[Bug target/39123] x86 asm *(a+b) input causes out of registers above -O0

2009-02-06 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2009-02-07 06:13 ---
Created an attachment (id=17265)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17265&action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39123



[Bug target/39329] New: x86 -Os could use mulw for (uint16 * uint16)>>16

2009-02-28 Thread astrange at ithinksw dot com
Using 'gcc -Os -fomit-frame-pointer -march=core2 -mtune=core2' for

unsigned short mul_high_c(unsigned short a, unsigned short b)
{
return (unsigned)(a * b) >> 16;
}

unsigned short mul_high_asm(unsigned short a, unsigned short b)
{
unsigned short res;
asm("mulw %w2" : "=d"(res),"+a"(a) : "rm"(b));
return res;
}

I get

_mul_high_c:
subl$12, %esp
movzwl  20(%esp), %eax
movzwl  16(%esp), %edx
addl$12, %esp
imull   %edx, %eax
shrl$16, %eax
ret
_mul_high_asm:
subl$12, %esp
movl16(%esp), %eax
mulw 20(%esp)
addl$12, %esp
movl%edx, %eax
ret

mulw puts its outputs in dx:ax, and dx contains (dx:ax)>>16, so the shift is
avoided.

Ignoring the weird Darwin stack adjustment code, the version with mulw is
somewhat shorter and avoids a movzwl. I'm not sure what the performance
difference is; mulw is listed in Agner's tables as fairly low latency, but
requires a length changing prefix for memory.

This type of operation is useful in fixed-point math, such as embedded audio
codecs or arithmetic coders.


-- 
   Summary: x86 -Os could use mulw for (uint16 * uint16)>>16
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
    AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i?86-*-*
  GCC host triplet: i?86-*-*
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39329



[Bug target/39337] New: x86 use of VLA disables -fomit-frame-pointer

2009-03-01 Thread astrange at ithinksw dot com
Using gcc 4.4.0 20090226 with -Os on:

int f(int a)
{
if (!a) {
return 0;
} else {
volatile int vla[a];
vla[0] = 0;
return vla[0];
}
}

gives:
_f:
pushl   %ebp
xorl%eax, %eax
movl%esp, %ebp
subl$8, %esp
movl8(%ebp), %edx
testl   %edx, %edx
je  L3
movl%esp, %ecx
leal30(,%edx,4), %eax
andl$-16, %eax
subl%eax, %esp
leal15(%esp), %eax
andl$-16, %eax
movl$0, (%eax)
movl(%eax), %eax
movl%ecx, %esp
L3:
leave
ret

Adding -fomit-frame-pointer gives the exact same result. ebp shouldn't be saved
here, since esp is saved to and restored from ecx anyway, so it's not actually
used for anything.

This isn't just a problem for crazy asm - gcc errors if an asm clobbers ebp in
a function with VLAs- but also means that inlining a function with VLAs makes
generated code worse, since the entire function loses one register.


-- 
   Summary: x86 use of VLA disables -fomit-frame-pointer
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i?86-*-*
  GCC host triplet: i?86-*-*
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39337



[Bug target/39337] x86 use of VLA disables -fomit-frame-pointer

2009-03-01 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2009-03-02 02:39 ---
> This is correct, vla and alloca always uses a frame pointer because there is 
> no way to get back to the original offsets so the compiler needs a frame 
> pointer.

It's not restoring from the frame pointer here, it's restoring from ecx. 'addl
$8, %esp' would work just as well in the function epilogue, like it would if
this function had no VLA.

Disabling inlining does fix that problem, though.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39337



[Bug inline-asm/11203] source doesn't compile with -O0 but they compile with -O3

2009-10-18 Thread astrange at ithinksw dot com


--- Comment #40 from astrange at ithinksw dot com  2009-10-18 19:56 ---
Linked from http://x264dev.multimedia.cx/?p=185, I'd forgotten all about the
ridiculous flamewar in this one.

Just as a note, the actual definitions of the four variables (from liba52):
  x2k = x + 2 * k;
  x3k = x2k + 2 * k;
  x4k = x3k + 2 * k;
  wB = wTB + 2 * k;

Also, the asm inputs are silly - output 0 is the same as input 6 for no reason,
and the same with output 2 and input 7. So change those to "+m" and change
%6/%7 to %0/%2.

That doesn't actually change anything, even though it should free two
registers. It works with gcc 4.5 -O0 -fno-pic -fomit-frame-pointer, but not
without one of those flags. Looks like that's because it's allocating 2 more
registers for the unused fake inputs for the "+m" - change it to "=m" and it
works with one flag removed, but still not both. So there's a specific bug.

And of course it all works at -O1 because it doesn't have to use registers
there. So maybe it should just do that.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11203



[Bug tree-optimization/36646] [4.3/4.4/4.5 Regression] Unnecessary moves generated on loop boundaries

2009-10-20 Thread astrange at ithinksw dot com


--- Comment #7 from astrange at ithinksw dot com  2009-10-20 21:10 ---
Tried with SVN today and it's fixed:

L6:
incb(%ebx)
jmp L12
.align 4,0x90

Close if you want; I don't think it's worth finding when this happened.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646



[Bug tree-optimization/36646] [4.3/4.4/4.5 Regression] Unnecessary moves generated on loop boundaries

2009-11-07 Thread astrange at ithinksw dot com


--- Comment #8 from astrange at ithinksw dot com  2009-11-07 09:03 ---
Closing.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646



[Bug tree-optimization/32015] New: pointer-plus ICE in extract_range_from_binary_expr, at tree-vrp.c:1755

2007-05-20 Thread astrange at ithinksw dot com
../gcc/configure --prefix=/usr/local/gcc42 --enable-threads --with-arch=nocona
--with-tune=nocona --disable-bootstrap --disable-nls --with-gmp=/sw
--with-mpfr=/sw --with-system-zlib --enable-languages=c,c++,objc,obj-c++
--disable-multilib
Thread model: posix
gcc version 4.3.0 20070513 (experimental)

using pointer-plus r124878

> /usr/local/gcc42/bin/gcc -O3 -c pp-vrp-ice.c
pp-vrp-ice.c: In function 'ff_pred8x8_plane_c':
pp-vrp-ice.c:4: internal compiler error: in extract_range_from_binary_expr, at
tree-vrp.c:1755
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.

typedef unsigned char uint8_t;
extern uint8_t ff_cropTbl[256 + 2 * 1024];

void ff_pred8x8_plane_c(uint8_t *src, int stride){
  int j, k;
  int a;
  uint8_t *cm = ff_cropTbl + 1024;
  const uint8_t * const src0 = src+3-stride;
  const uint8_t *src1 = src+4*stride-1;
  const uint8_t *src2 = src1-2*stride;
  int H = src0[1] - src0[-1];
  int V = src1[0] - src2[ 0];
  for(k=2; k<=4; ++k) {
src1 += stride; src2 -= stride;
H += k*(src0[k] - src0[-k]);
V += k*(src1[0] - src2[ 0]);
  }
  H = ( 17*H+16 ) >> 5;
  V = ( 17*V+16 ) >> 5;

  a = 16*(src1[0] + src2[8]+1) - 3*(V+H);
  for(j=8; j>0; --j) {
int b = a;
a += V;
src[0] = cm[ (b ) >> 5 ];
src[1] = cm[ (b+ H) >> 5 ];
src[2] = cm[ (b+2*H) >> 5 ];
src[3] = cm[ (b+3*H) >> 5 ];
src[4] = cm[ (b+4*H) >> 5 ];
src[5] = cm[ (b+5*H) >> 5 ];
src[6] = cm[ (b+6*H) >> 5 ];
src[7] = cm[ (b+7*H) >> 5 ];
src += stride;
  }
}


-- 
   Summary: pointer-plus ICE in extract_range_from_binary_expr, at
tree-vrp.c:1755
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
    AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i386-apple-darwin8.9.1
  GCC host triplet: i386-apple-darwin8.9.1
GCC target triplet: i386-apple-darwin8.9.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32015



[Bug tree-optimization/32572] New: Small C++ function fails to inline with large negative badness

2007-07-01 Thread astrange at ithinksw dot com
The code below doesn't get image::set inlined into set_test at -O3, though it's
trivial.
Is it an integer overflow problem?

> /usr/local/gcc43/bin/g++ -v
Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: ../gcc/configure --prefix=/usr/local/gcc43 --disable-multilib
--with-arch=pentium-m --with-tune=nocona --enable-target-optspace
--disable-bootstrap --with-gmp=/sw --with-system-zlib
--enable-languages=c,c++,objc,obj-c++
Thread model: posix
gcc version 4.3.0 20070701 (experimental)
> /usr/local/gcc43/bin/g++ -O3 -c -fdump-ipa-all -fdump-tree-final_cleanup 
> rt-inline-overflow.cpp

>From the .inline dump:
Deciding on inlining.  Starting with 39 insns.

Inlining always_inline functions:

Deciding on smaller functions:
Considering inline candidate void image::set(size_t, size_t, f_pixel, f_real).

Considering void image::set(size_t, size_t, f_pixel, f_real) with 15 insns
 to be inlined into void set_test(image*, int, int, f_pixel&, double)
 Estimated growth after inlined into all callees is -21 insns.
 Estimated badness is -2147483642, frequency 1.00.
 Not inlining into void set_test(image*, int, int, f_pixel&, double):function
not considered for inlining.

Deciding on functions called once:

Inlined 3 calls, eliminated 2 functions, 39 insns turned to 39 insns.

typedef unsigned int size_t;
typedef float f_real;

template  struct triple
{
 union {
  T val[3];
  struct {T x,y,z;};
  struct {T r,g,b;};
 };
 T pad;

 triple(const T v[3]) {for (int i = 0; i < 3; i++) val[i] = v[i];}
 triple(T a_, T b_=0., T c_=0.) {x = a_; y = b_; z = c_;}
 triple() {}

 triple(const triple &t) {x=t.x;y=t.y;z=t.z;}
 triple(const triple *t) {x=t->x;y=t->y;z=t->z;}

 triple operator=(const triple &t) {x=t.x;y=t.y;z=t.z; return *this;}
} __attribute__((aligned));

typedef triple f_pixel;

struct image
{
 f_pixel *buf;
 f_real *depth_buf;
 f_pixel minv, maxv;
 size_t w, h;

 image(size_t w, size_t h);
 ~image();
 void write_to_bmp(const char *path);

 inline void set(size_t x, size_t y, const f_pixel p, f_real depth) {
  buf[y*w + x] = p;
  depth_buf[y*w + x] = depth;
 }

};

void set_test(image *target, int x, int y, f_pixel &c, double dist)
{
 target->set(x,y,c,dist);
}


-- 
   Summary: Small C++ function fails to inline with large negative
badness
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i386-apple-darwin8.10.1
  GCC host triplet: i386-apple-darwin8.10.1
GCC target triplet: i386-apple-darwin8.10.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32572



[Bug tree-optimization/32572] Small C++ function fails to inline with large negative badness

2007-07-01 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2007-07-02 06:18 ---
This is a regression from Apple gcc 4.0:

Considering void image::set(size_t, size_t, f_pixel, f_real) with 29 insns
 Estimated growth is -21 insns.
 Inlined into void set_test(image*, int, int, f_pixel&, double) which now has
32
 insns.
 Inlined for a net change of -21 insns.

and probably from earlier 4.3.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32572



[Bug tree-optimization/32590] New: [4.3 regression] Duplicate code generated on both branches of if/else

2007-07-02 Thread astrange at ithinksw dot com
> /usr/local/gcc43/bin/g++ -v
Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: ../gcc/configure --prefix=/usr/local/gcc43 --disable-multilib
--with-arch=pentium-m --with-tune=nocona --enable-target-optspace
--disable-bootstrap --with-gmp=/sw --with-system-zlib
--enable-languages=c,c++,objc,obj-c++
Thread model: posix
gcc version 4.3.0 20070702 (experimental)

> /usr/local/gcc43/bin/g++ -O3 -fdump-tree-final_cleanup -ffast-math 
> -fno-unswitch-loops -g -c rt-no-load-motion.cpp

Here:
:
  if (closest == 0)
goto ;
  else
goto ;

:
  res = p;
  goto ;

:
  prephitmp.47 = this->sc.primcount;
  res = p;

:

'res = p;' is duplicated on both sides of the branch in bb 15, and should be
lifted above it.

Furthermore, if that was done, these three blocks:
:
  prephitmp.47 = this->sc.primcount;
  goto ;

:
  prephitmp.47 = this->sc.primcount;
  goto ;

:
  prephitmp.47 = this->sc.primcount;
  res = p;

:

would be identical. Two of them already are, but aren't merged.
(I think those prephitmp.47 statements can all be merged too, but I'm not sure
about that.)

g++ 4.1 doesn't generate so many copies; I don't remember seeing this in
earlier 4.3s either.


-- 
   Summary: [4.3 regression] Duplicate code generated on both
branches of if/else
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i386-apple-darwin8.10.1
  GCC host triplet: i386-apple-darwin8.10.1
GCC target triplet: i386-apple-darwin8.10.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32590



[Bug tree-optimization/32590] [4.3 regression] Duplicate code generated on both branches of if/else

2007-07-02 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2007-07-02 18:04 ---
Created an attachment (id=13826)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13826&action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32590



[Bug tree-optimization/32593] New: Missed optimization of 'y = constant - x' operation

2007-07-02 Thread astrange at ithinksw dot com
> /usr/local/gcc43/bin/g++ -v
Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: ../gcc/configure --prefix=/usr/local/gcc43 --disable-multilib
--with-arch=pentium-m --with-tune=nocona --enable-target-optspace
--disable-bootstrap --with-gmp=/sw --with-system-zlib
--enable-languages=c,c++,objc,obj-c++
Thread model: posix
gcc version 4.3.0 20070702 (experimental)

> /usr/local/gcc43/bin/gcc -Os -fno-pic -fomit-frame-pointer -S sub.c

"i= 7 - ff_h264_norm_shift[x>>(CABAC_BITS-1)];" generates:

movl$7, %ecx
subl%eax, %ecx
sall%cl, %edx

It would be better if it generated:

negl %eax
addl $7, %eax
sall%al, %edx

which would leave a register free (which helps if this function is inlined);
this is safe since eax isn't used again later.

You can do this by transforming 'y = constant - x' into 'y = -x + constant'.
It looks like gcc actually does the reverse; if I change the source to match
the best output, it generates the same thing.


-- 
   Summary: Missed optimization of 'y = constant - x' operation
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i386-apple-darwin8.10.1
  GCC host triplet: i386-apple-darwin8.10.1
GCC target triplet: i386-apple-darwin8.10.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32593



[Bug tree-optimization/32593] Missed optimization of 'y = constant - x' operation

2007-07-02 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2007-07-02 18:47 ---
Created an attachment (id=13827)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13827&action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32593



[Bug tree-optimization/61515] New: Extremely long compile time for generated code

2014-06-15 Thread astrange at ithinksw dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

Bug ID: 61515
   Summary: Extremely long compile time for generated code
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: astrange at ithinksw dot com

> /usr/local/gcc49/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc49/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc49/libexec/gcc/x86_64-apple-darwin13.2.0/4.10.0/lto-wrapper
Target: x86_64-apple-darwin13.2.0
Configured with: ../../cc/gcc/configure --prefix=/usr/local/gcc49
--with-arch=native --with-tune=native --disable-nls --with-gmp=/sw
--disable-bootstrap --with-isl=/sw --enable-languages=c,c++,lto,objc,obj-c++
--no-create --no-recursion
Thread model: posix
gcc version 4.10.0 20140615 (experimental) (GCC) 

For the attached source (C translation from a large BF program):
- gcc -O0 takes 9 minutes
- gcc -Os does not finish after 40 minutes


[Bug tree-optimization/61515] Extremely long compile time for generated code

2014-06-15 Thread astrange at ithinksw dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #1 from Alexander Strange  ---
Created attachment 32944
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=32944&action=edit
Preprocessed source


[Bug tree-optimization/61515] Extremely long compile time for generated code

2014-06-15 Thread astrange at ithinksw dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #3 from Alexander Strange  ---
Without checking, -O0 went from 8 -> 5 minutes.

I stopped the -Os compile at 29 minutes.


[Bug tree-optimization/43224] New: Constant load not raised out of loop

2010-03-01 Thread astrange at ithinksw dot com
Source:
#include 

void dequant_lsps(double *lsps, int num,
  const unsigned short *values,
  int n_stages, const unsigned char * __restrict table,
  const double * __restrict mul_q, const double * __restrict
base_q)
{
const unsigned char *t_off = &table[values[0] * num];
int m;

memset(lsps, 0, num * sizeof(*lsps));

for (m = 0; m < num; m++)
lsps[m] += base_q[0] + mul_q[0] * t_off[m];
}

> /usr/local/gcc45/bin/gcc -O3 -S base_lsp.c

The inner loop:
L3:
movzbl  (%r15), %edx
incq%r15
cvtsi2sd%edx, %xmm0
mulsd   0(%r13), %xmm0 <- constant (and "0" prefix)
addsd   (%r14), %xmm0 <- constant
addsd   (%rbx,%rax), %xmm0
movsd   %xmm0, (%rbx,%rax)
addq$8, %rax
cmpq%rcx, %rax
jne L3

Rest of the output attached.
base_q and mul_q should be loaded outside of the loop but aren't. I added
__restrict to base_q/mul_q/table, but it didn't affect it.
Code is reduced from FFmpeg WMA Voice decoder.


-- 
   Summary: Constant load not raised out of loop
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-apple-darwin10.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43224



[Bug tree-optimization/43224] Constant load not raised out of loop

2010-03-01 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-03-02 03:45 ---
Created an attachment (id=20002)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20002&action=view)
x86-64 asm output


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43224



[Bug tree-optimization/43224] Constant load not raised out of loop

2010-03-01 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2010-03-02 04:00 ---
Is it possible for aliased writes to affect a const pointer? I was assuming
that it wasn't.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43224



[Bug target/43225] New: Structure copies not vectorized

2010-03-01 Thread astrange at ithinksw dot com
Source:

#include 

struct a1 { char l[16];};
struct a2 { __m128i l; };

void f1(struct a1 *a, struct a1 *b)
{
*a = *b;
}

void f2(struct a2 *a, struct a2 *b)
{
*a = *b;
}

> /usr/local/gcc45/bin/gcc -O3 -fomit-frame-pointer -S copy_gcc.c
_f1:
movq(%rsi), %rax
movq%rax, (%rdi)
movq8(%rsi), %rax
movq%rax, 8(%rdi)
ret

_f2:
movdqa  (%rsi), %xmm0
movdqa  %xmm0, (%rdi)
ret

Both are appropriately aligned and should use movdqa. This might not show up in
generic code, but I could have used it in an ffmpeg optimization.


-- 
   Summary: Structure copies not vectorized
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43225



[Bug target/43225] Structure copies not vectorized

2010-03-01 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-03-02 05:31 ---
-fdump-tree-slp-details:
copy_gcc.c:8: note: ===vect_slp_analyze_bb===

copy_gcc.c:8: note: === vect_analyze_data_refs ===
Creating dr for *b_2(D)
analyze_innermost: success.
base_address: b_2(D)
offset from base address: 0
constant offset from base address: 0
step: 0
aligned to: 128
base_object: *b_2(D)
Creating dr for *a_1(D)
analyze_innermost: success.
base_address: a_1(D)
offset from base address: 0
constant offset from base address: 0
step: 0
aligned to: 128
base_object: *a_1(D)

copy_gcc.c:8: note: not vectorized: no vectype for stmt: *a_1(D) = *b_2(D);
 scalar_type: struct a1
copy_gcc.c:8: note: not vectorized: unhandled data-ref in basic block.
f1 (struct a1 * a, struct a1 * b)
{
:
  *a_1(D) = *b_2(D);
  return;

}

Though I tried it with __attribute__((aligned)) too.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43225



[Bug target/43233] New: x86 flags not combined across blocks

2010-03-02 Thread astrange at ithinksw dot com
Source:
int g1,g2,g3;

int f1(int a, int b)
{
a &= 1;

if (a) return g1;
return g2;
}

int f2(int a, int b)
{
a &= 1;

if (b)
g3++;

if (a) return g1;
return g2;
}

Compiled with:
> gcc -O3 -fomit-frame-pointer -S and_flags.c

f1 is ok but f2 generates this:
_f2:
andl$1, %edi <-- #1
testl   %esi, %esi
je  L7
movq_...@gotpcrel(%rip), %rax
incl(%rax)
L7:
testl   %edi, %edi <-- #2
jne L10
movq_...@gotpcrel(%rip), %rax
movl(%rax), %eax
ret
.align 4,0x90
L10:
movq_...@gotpcrel(%rip), %rax
movl(%rax), %eax
ret

The andl and testl should be folded into one andl.

Code is reduced from ffmpeg h264 decoder. It's easy to work around by
reordering source lines, so not too important.


-- 
   Summary: x86 flags not combined across blocks
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43233



[Bug lto/43318] New: LTO ICE with minimal C++ program

2010-03-09 Thread astrange at ithinksw dot com
Using svn r157325 on Ubuntu.

> /usr/local/gcc45/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc45/bin/g++
COLLECT_LTO_WRAPPER=/usr/local/gcc45/libexec/gcc/i686-pc-linux-gnu/4.5.0/lto-wrapper
Target: i686-pc-linux-gnu
Configured with: ../gcc/configure --enable-threads=posix --with-arch=native
--with-tune=native --disable-nls --disable-bootstrap --prefix=/usr/local/gcc45
--with-mpc=/usr/local --enable-languages=c,c++,objc,lto --enable-lto
--enable-gold
Thread model: posix
gcc version 4.5.0 20100309 (experimental) (GCC) 

Source:
void a()
{
}

> /usr/local/gcc45/bin/g++ -flto -c a.cpp
> /usr/local/gcc45/bin/g++ -flto -O -r -nostdlib a.o
a/0(-1) @0xb769b398 availability:available needed reachable body
externally_visible finalized
  called by: 
  calls: 
callgraph:

a/0(-1) @0xb769b398 availability:available needed reachable body
externally_visible finalized
  called by: 
  calls: 
lto1: internal compiler error: in propagate, at ipa-reference.c:1244
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
lto-wrapper: /usr/local/gcc45/bin/g++ returned 1 exit status
collect2: lto-wrapper returned 1 exit status

> /usr/local/gcc45/bin/g++ -flto -O -fno-ipa-reference -r -nostdlib a.o
lto1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
lto-wrapper: /usr/local/gcc45/bin/g++ returned 1 exit status
collect2: lto-wrapper returned 1 exit status


-- 
   Summary: LTO ICE with minimal C++ program
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: astrange at ithinksw dot com
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43318



[Bug lto/43318] LTO ICE with minimal C++ program

2010-03-09 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-03-10 00:32 ---
Actually, it doesn't work in C either. I find that unlikely, time to make sure
I didn't build it wrong somehow...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43318



[Bug lto/43318] LTO ICE with minimal C++ program

2010-03-09 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2010-03-10 00:37 ---


*** This bug has been marked as a duplicate of 42402 ***


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43318



[Bug lto/42402] ICE in propagate, at ipa-reference.c:1244

2010-03-09 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-03-10 00:37 ---
*** Bug 43318 has been marked as a duplicate of this bug. ***


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 CC||astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42402



[Bug lto/43342] lto1: internal compiler error: failed to reclaim unneeded function

2010-03-14 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2010-03-14 23:33 ---
This happens building ffmpeg --enable-shared with -fwhopr. I can make a
testcase out of that if needed.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 CC||astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43342



[Bug lto/43372] New: lto ICE in strip_extension with linker plugin

2010-03-14 Thread astrange at ithinksw dot com
Source:
a.c:
int a()
{
return 0;
}

b.c:
extern int a();
int b()
{
a();
}

> gcc -fwhopr -c a.c b.c
> ar r liba.a a.o
> gcc -fwhopr -fuse-linker-plugin -shared -o libb.so b.o liba.a
lto1: internal compiler error: in strip_extension, at lto/lto.c:910
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
lto-wrapper: /usr/local/gcc45/bin/gcc returned 1 exit status
/usr/bin/ld: fatal error: lto-wrapper failed
collect2: ld returned 1 exit status

It fails trying to strip ".o" from "liba.a". (I added an extra line to print
that, so the ICE line number is off by 1.)
Using gcc 20100314 and gold from Ubuntu binutils-gold 2.20-0ubuntu2.


-- 
   Summary: lto ICE in strip_extension with linker plugin
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: astrange at ithinksw dot com
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43372



[Bug lto/43373] New: whopr+linker plugin ICE compressed stream data error

2010-03-14 Thread astrange at ithinksw dot com
> gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc45/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc45/libexec/gcc/i686-pc-linux-gnu/4.5.0/lto-wrapper
Target: i686-pc-linux-gnu
Configured with: ../gcc/configure --with-arch=native --with-tune=native
--disable-bootstrap --with-mpc=/usr/local --enable-languages=c,c++,objc,lto
--enable-gold --enable-lto --prefix=/usr/local/gcc45
Thread model: posix
gcc version 4.5.0 20100314 (experimental) (GCC) 
> ld --version
GNU gold (GNU Binutils for Ubuntu 2.20) 1.9
Copyright 2008 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
> cat a.c
int main(void) {return 0;}
> gcc -fwhopr -fuse-linker-plugin -o a a.c -save-temps
lto1: internal compiler error: compressed stream: data error
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
lto1: fatal error: /usr/local/gcc45/bin/gcc terminated with status 256
compilation terminated.
lto-wrapper: /usr/local/gcc45/bin/gcc returned 1 exit status
/usr/bin/ld: fatal error: lto-wrapper failed
collect2: ld returned 1 exit status

Works without -fuse-linker-plugin. This prevents ffmpeg and x264 from
configuring for me if I put -fwhopr -fuse-linker-plugin in the CFLAGS/LDFLAGS.


-- 
   Summary: whopr+linker plugin ICE compressed stream data error
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: astrange at ithinksw dot com
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43373



[Bug lto/43373] whopr+linker plugin ICE compressed stream data error

2010-03-15 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-03-15 11:10 ---
The last two commands were the source and testcase. Should have spaced it out
more.

i don't have enough memory allocated to this VM to build ffmpeg without whopr,
so I thought i'd try the more experimental path first.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43373



[Bug target/43550] New: arm missing rev16

2010-03-26 Thread astrange at ithinksw dot com
typedef unsigned short uint16_t;
typedef unsigned int   uint32_t;

uint16_t s16(uint16_t v)
{
return v>>8|v<<8;
}

uint32_t s32(uint32_t v)
{
return __builtin_bswap32(v);
}

> gcc -O3 -mcpu=cortex-a8 -S bswap.c
s16:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mov r3, r0, lsr #8
orr r0, r3, r0, asl #8
uxthr0, r0
bx  lr

s32:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
rev r0, r0
bx  lr

It generates 32-bit bswap using rev but not 16-bit using rev16. x86 can do
both.


-- 
   Summary: arm missing rev16
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: astrange at ithinksw dot com
GCC target triplet: arm-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43550



[Bug rtl-optimization/43721] Failure to optimise (a/b) and (a%b) into single __aeabi_idivmod call

2010-04-11 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-04-12 03:54 ---
Still the case with 4.5.

> arm-none-linux-gnueabi-gcc -Os -S divmod.c
> cat divmod.s
.cpu arm10tdmi
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 4
.eabi_attribute 18, 4
.file   "divmod.c"
.global __aeabi_idivmod
.global __aeabi_idiv
.text
.align  2
.global divmod
.type   divmod, %function
divmod:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
stmfd   sp!, {r4, r5, r6, lr}
mov r6, r0
mov r5, r1
bl  __aeabi_idivmod
mov r0, r6
mov r4, r1
mov r1, r5
bl  __aeabi_idiv
add r0, r4, r0
ldmfd   sp!, {r4, r5, r6, pc}
.size   divmod, .-divmod
.ident  "GCC: (GNU) 4.5.0 20100325 (experimental)"
.section.note.GNU-stack,"",%progbits


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 CC|                |astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721



[Bug target/43723] New: Some ARMs support unaligned

2010-04-11 Thread astrange at ithinksw dot com
Source:
struct s { int i; } __attribute__((packed));

int a(struct s *s)
{
return s->i;
}

Using 4.5:
> /usr/local/gcc-arm/bin/arm-none-linux-gnueabi-gcc -Os -mcpu=cortex-a8 -S 
> unaligned.c
> cat unaligned.s
.cpu cortex-a8
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 4
.eabi_attribute 18, 4
.file   "unaligned.c"
.text
.align  2
.global a
.type   a, %function
a:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldrbr2, [r0, #1]@ zero_extendqisi2
ldrbr3, [r0, #0]@ zero_extendqisi2
orr r3, r3, r2, asl #8
ldrbr2, [r0, #2]@ zero_extendqisi2
ldrbr0, [r0, #3]@ zero_extendqisi2
orr r3, r3, r2, asl #16
orr r0, r3, r0, asl #24
bx  lr
.size   a, .-a
.ident  "GCC: (GNU) 4.5.0 20100325 (experimental)"
.section.note.GNU-stack,"",%progbits

At least some configurations of cortex-a8 support unaligned access just fine,
so it should be possible to use it. But it doesn't look like it is - there is
no -mno-strict-align for arm. This would be a major code size reduction for
FFmpeg.


-- 
   Summary: Some ARMs support unaligned
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: astrange at ithinksw dot com
GCC target triplet: arm-unknown-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43723



[Bug target/43766] New: x86 prefetch doesn't use complex memory addressing

2010-04-16 Thread astrange at ithinksw dot com
Source:
void p(int *a, int i)
{
__builtin_prefetch(&a[i]);
}

> gcc -O3 -fomit-frame-pointer -S prefetch.c
_p:
movslq  %esi, %rsi
leaq(%rdi,%rsi,4), %rax
prefetcht0  (%rax)
ret

leaq and prefetch should be merged.


-- 
   Summary: x86 prefetch doesn't use complex memory addressing
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43766



[Bug target/43766] x86 prefetch doesn't use complex memory addressing

2010-04-16 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2010-04-16 21:19 ---
Works with x86-64.

Checking -m32, the same thing happens with or without the patch:
_p:
subl$12, %esp
movl20(%esp), %eax
sall$2, %eax
addl16(%esp), %eax
addl$12, %esp
prefetcht0  (%eax)
ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43766



[Bug c/42136] New: Inconsistent strict-aliasing warning with cast from char[]

2009-11-21 Thread astrange at ithinksw dot com
Source:
typedef union u { unsigned i; unsigned short s[2]; unsigned char c[4]; } u;

char c[4] __attribute__((aligned));
short s[2] __attribute__((aligned));

int f1()
{
return ((union u*)s)->i;
}

int f2()
{
return ((union u*)c)->i;
}

Using gcc 4.5:

> gcc -O3 -fstrict-aliasing -Wall -S wstrict_aliasing_char.c

wstrict_aliasing_char.c: In function 'f2':
wstrict_aliasing_char.c:13:17: warning: dereferencing type-punned pointer will
break strict-aliasing rules

I would expect either both or neither of the functions to warn, since pointer
casting to unions is given in the manual as something that violates
strict-aliasing, although gcc doesn't seem to actually take advantage of this.

Instead, it looks like the warning is hardcoded to apply to a cast from char
(c-common.c:1746 in r1554411):
  alias_set_type set1 =
get_alias_set (TREE_TYPE (TREE_OPERAND (expr, 0)));
  alias_set_type set2 = get_alias_set (TREE_TYPE (type));

  if (set1 != set2 && set2 != 0
  && (set1 == 0 || !alias_sets_conflict_p (set1, set2)))
{
  warning (OPT_Wstrict_aliasing, "dereferencing type-punned "
   "pointer will break strict-aliasing rules");
  return true;
}

This came up during some x264 work, but it's taken care of now with some
__attribute__((may_alias)).


-- 
   Summary: Inconsistent strict-aliasing warning with cast from
char[]
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42136



[Bug tree-optimization/42211] New: Segmentation fault with graphite -floop-interchange

2009-11-29 Thread astrange at ithinksw dot com
> gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc45/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc45/libexec/gcc/x86_64-apple-darwin10.2.0/4.5.0/lto-wrapper
Target: x86_64-apple-darwin10.2.0
Configured with: ../gcc/configure --prefix=/usr/local/gcc45
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --with-ppl=/sw --with-cloog=/sw --with-libelf=/sw --disable-nls
--disable-bootstrap LDFLAGS=/usr/lib/libiconv.dylib
--enable-languages=c,c++,lto,objc,obj-c++
Thread model: posix
gcc version 4.5.0 20091129 (experimental) (GCC) 

Using r154734.

With attached source:
> gcc -O3 -floop-interchange -S graphite_crash.i
graphite_crash.i: In function 'border_mirror_480':
graphite_crash.i:17:6: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

It doesn't happen reliably to me with -v -Q, so I can't check with gdb.
Valgrind gives:
==12758== Invalid read of size 8
==12758==at 0x1004AE4A3: lst_do_interchange_1 (graphite-interchange.c:709)
==12758==by 0x1004AE525: lst_do_interchange (graphite-interchange.c:730)
==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734)
==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748)
==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260)
==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276)
==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300)
==12758==by 0x10057D522: execute_one_pass (passes.c:1522)
==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)
==12758==by 0x1006AAA80: tree_rest_of_compilation (tree-optimize.c:408)
==12758==  Address 0x141c25210 is 16 bytes inside a block of size 24 free'd
==12758==at 0x140EB88DC: free (vg_replace_malloc.c:325)
==12758==by 0x1004AE00C: lst_try_interchange (graphite-poly.h:704)
==12758==by 0x1004AE49F: lst_do_interchange_1 (graphite-interchange.c:710)
==12758==by 0x1004AE525: lst_do_interchange (graphite-interchange.c:730)
==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734)
==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748)
==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260)
==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276)
==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300)
==12758==by 0x10057D522: execute_one_pass (passes.c:1522)
==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)
==12758== 
==12758== Invalid read of size 8
==12758==at 0x1004AE534: lst_do_interchange (graphite-interchange.c:732)
==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734)
==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748)
==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260)
==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276)
==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300)
==12758==by 0x10057D522: execute_one_pass (passes.c:1522)
==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)
==12758==by 0x1006AAA80: tree_rest_of_compilation (tree-optimize.c:408)
==12758==by 0x100866F56: cgraph_expand_function (cgraphunit.c:1178)
==12758==  Address 0x141c25210 is 16 bytes inside a block of size 24 free'd
==12758==at 0x140EB88DC: free (vg_replace_malloc.c:325)
==12758==by 0x1004AE00C: lst_try_interchange (graphite-poly.h:704)
==12758==by 0x1004AE49F: lst_do_interchange_1 (graphite-interchange.c:710)
==12758==by 0x1004AE525: lst_do_interchange (graphite-interchange.c:730)
==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734)
==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748)
==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260)
==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276)
==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300)
==12758==by 0x10057D522: execute_one_pass (passes.c:1522)
==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)


-- 
   Summary: Segmentation fault with graphite -floop-interchange
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: astrange at ithinks

[Bug tree-optimization/42211] Segmentation fault with graphite -floop-interchange

2009-11-29 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2009-11-29 09:38 ---
Created an attachment (id=19175)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19175&action=view)
somewhat-reduced source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42211



[Bug target/43225] Structure copies not vectorized

2011-03-29 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43225

--- Comment #4 from Alexander Strange  2011-03-29 
20:39:28 UTC ---
Better source:

#include 

struct a1 { char l[16];} __attribute__((aligned));
struct a2 { __m128i l; } __attribute__((aligned));

void f1(struct a1 *a, struct a1 *b)
{
*a = *b;
}

void f2(struct a2 *a, struct a2 *b)
{
*a = *b;
}

void f3(__m128i *a, __m128i *b)
{
*a = *b;
}

Code is the same as above in svn. LLVM uses movaps for all three functions.


[Bug target/44073] x86 constants could be unduplicated

2010-08-07 Thread astrange at ithinksw dot com


--- Comment #5 from astrange at ithinksw dot com  2010-08-08 06:39 ---
That commit doesn't reverse cleanly anymore, and I'm not sure how to update it.
I don't have any pre-2005 gccs at the moment to test with.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44073



[Bug target/44474] GCC inserts redundant "test" instruction due to incorrect clobber

2010-08-28 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-08-29 06:39 ---
Still happens with the new combine work (not that I really expected it to
change).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44474



[Bug rtl-optimization/45788] New: -fwhole-program causes ICE error: BB 3 can not throw but has an EH edge

2010-09-24 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45788

   Summary: -fwhole-program causes ICE error: BB 3 can not throw
but has an EH edge
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: astra...@ithinksw.com


> gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc46/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.4.0/4.6.0/lto-wrapper
Target: x86_64-apple-darwin10.4.0
Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46
--with-arch=native --with-tune=native --disable-nls --with-gmp=/sw
--disable-bootstrap --enable-languages=c,c++,lto,objc,obj-c++
Thread model: posix
gcc version 4.6.0 20100924 (experimental) (GCC) 

> gcc -O3 -fwhole-program -S eh_ice.ii
eh_ice.ii: In function 'void
_ZL9set_colorP9primitive7vectorXIfLi4EE.isra.3.constprop.5(texture**, color4)':
eh_ice.ii:93:15: error: BB 3 can not throw but has an EH edge
eh_ice.ii:93:15: internal compiler error: verify_flow_info failed
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

Removing -fwhole-program fixes it.

-- 
Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.


[Bug rtl-optimization/45788] -fwhole-program causes ICE error: BB 3 can not throw but has an EH edge

2010-09-24 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45788

--- Comment #1 from Alexander Strange  2010-09-25 
06:51:33 UTC ---
BTW, I think the error would be a lot clearer if it printed the pre-cloning/etc
function name.

-- 
Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.


[Bug rtl-optimization/45788] -fwhole-program causes ICE error: BB 3 can not throw but has an EH edge

2010-09-25 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45788

--- Comment #4 from Alexander Strange  2010-09-25 
19:50:29 UTC ---
I (probably) definitely attached it, is the attachment form in the new bugs
page not working?


[Bug target/36503] x86 can use x >> -y for x >> 32-y

2010-10-20 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36503

--- Comment #8 from Alexander Strange  2010-10-21 
04:39:36 UTC ---
I built ffmpeg for x86-64 with --disable-asm with the attached patch and the
regression tests failed. Reverting the patch fixes them. I saved the binaries
but haven't investigated yet.


[Bug rtl-optimization/46248] New: 4.6 regression: crash+infinite recursion in combine

2010-10-31 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46248

   Summary: 4.6 regression: crash+infinite recursion in combine
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: astra...@ithinksw.com


Created attachment 22210
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22210
source

gcc r166084 crashes compiling ffmpeg libpostproc on x86-64-apple-darwin10.

Minimized-ish source attached.

> gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc46/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.4.0/4.6.0/lto-wrapper
Target: x86_64-apple-darwin10.4.0
Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46
--with-arch=native --with-tune=native --disable-nls --with-gmp=/sw
--disable-bootstrap --enable-languages=c,c++,lto,objc,obj-c++
Thread model: posix
gcc version 4.6.0 20101030 (experimental) (GCC) 

> gcc -O3 -S postprocess.i 
gcc: internal compiler error: Segmentation fault (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

Backtrace:

#0  0x00010031fc34 in if_then_else_cond (x=0x1425e14b0,
ptrue=0x7fff5f400078, pfalse=0x7fff5f400068) at
../../../src/gcc/gcc/combine.c:8471
#1  0x00010031fd82 in if_then_else_cond (x=0x1425e1498,
ptrue=0x7fff5f400118, pfalse=0x7fff5f400108) at
../../../src/gcc/gcc/combine.c:8507
#2  0x00010031fd82 in if_then_else_cond (x=0x1425e14b0,
ptrue=0x7fff5f4001b8, pfalse=0x7fff5f4001a8) at
../../../src/gcc/gcc/combine.c:8507
#3  0x00010031fd82 in if_then_else_cond (x=0x1425e1498,
ptrue=0x7fff5f400258, pfalse=0x7fff5f400248) at
../../../src/gcc/gcc/combine.c:8507
#4  0x00010031fd82 in if_then_else_cond (x=0x1425e14b0,
ptrue=0x7fff5f4002f8, pfalse=0x7fff5f4002e8) at
../../../src/gcc/gcc/combine.c:8507
#5  0x00010031fd82 in if_then_else_cond (x=0x1425e1498,
ptrue=0x7fff5f400398, pfalse=0x7fff5f400388) at
../../../src/gcc/gcc/combine.c:8507
#6  0x00010031fd82 in if_then_else_cond (x=0x1425e14b0,
ptrue=0x7fff5f400438, pfalse=0x7fff5f400428) at
../../../src/gcc/gcc/combine.c:8507
...


[Bug inline-asm/46615] New: [4.6 regression] possibly-invalid x86-64 inline asm miscompilation

2010-11-22 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46615

   Summary: [4.6 regression] possibly-invalid x86-64 inline asm
miscompilation
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: inline-asm
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: astra...@ithinksw.com


gcc 4.6 miscompiles this source from ffmpeg on x86-64-apple-darwin10, whereas
previous compilers worked. I'm not sure if the asm is legal, but it's existed
in the wild for a long time.

const unsigned long long __attribute__((aligned(8))) ff_bgr24toUV[2][4] =
{
{0x3838DAC83838ULL, 0xECFFDAC8ECFFULL, 0xF6E4D0E3F6E4ULL,
0x3838D0E33838ULL},
{0xECFFDAC8ECFFULL, 0x3838DAC83838ULL , 0x3838D0E33838ULL,
0xF6E4D0E3F6E4ULL},
};

static void 
bgr24ToUV_mmx_MMX2(int f)
{
__asm__ volatile(
"movq 24+%0, %%mm6 \n\t"
:: "m"(ff_bgr24toUV[f == 0][0]));
}

void 
rgb24ToUV_MMX2()
{
bgr24ToUV_mmx_MMX2(1);
}

> gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc46/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.5.0/4.6.0/lto-wrapper
Target: x86_64-apple-darwin10.5.0
Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46
--with-arch=native --with-tune=native --disable-nls --with-gmp=/sw
--disable-bootstrap --enable-checking --enable-languages=c,c++,lto,objc,obj-c++
Thread model: posix
gcc version 4.6.0 20101122 (experimental) (GCC) 
> gcc -O -o swscale-fails.s -S swscale.i 
swscale.i: In function 'rgb24ToUV_MMX2':
swscale.i:10:2: warning: use of memory input without lvalue in asm operand 0 is
deprecated [enabled by default]

Working asm (4.2):
_rgb24ToUV_MMX2:
pushq%rbp
movq%rsp, %rbp
movq 24+_ff_bgr24toUV(%rip), %mm6 
leave
ret
.globl _ff_bgr24toUV
.const
.align 3
_ff_bgr24toUV:
.quad4050987868490315832
.quad-1369135209168966401
.quad-656399642184648988
.quad4051217538195929144
.quad-1369375758026740481
.quad4051228417348089912
.quad4050987868324313144
.quad-656169972313032988
.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support

Non-working asm (4.6):
_rgb24ToUV_MMX2:
movq 24+LC0(%rip), %mm6 
ret
.globl _ff_bgr24toUV
.const
.align 3
_ff_bgr24toUV:
.quad4050987868490315832
.quad-1369135209168966401
.quad-656399642184648988
.quad4051217538195929144
.quad-1369375758026740481
.quad4051228417348089912
.quad4050987868324313144
.quad-656169972313032988
.literal8
.align 3
LC0:
.quad4050987868490315832
.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support

24+_ff_bgr24toUV(%rip) is fine, but 24+LC0(%rip) is a pointer to nothing, and
ld breaks:

ld: in /var/folders/MY/MYkVh2TwHgKZhNFIG8M3wU+++TI/-Tmp-//cc9dJIWa.o, in
section __TEXT,__text reloc 0: local relocation for address 0x000C in
section __text does not target section __literal8

I'm going to fix the asm since it looks fragile anyway, but that won't fix
existing releases of ffmpeg.

Note that creating LC0 is not even an optimization since it doesn't save any
space (because the array is __attribute__((used))).


[Bug tree-optimization/44063] [4.6 Regression]: build broken for libgcc cris-elf, ICE in cgraph_estimate_size_after_inlining, at ipa-inline

2010-05-10 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-05-11 03:38 ---
Created an attachment (id=20623)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20623&action=view)
testcase

This happens building ffmpeg on x86-64 now. Minimal-ish testcase attached.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44063



[Bug target/44073] New: x86 constants could be unduplicated

2010-05-11 Thread astrange at ithinksw dot com
void f1(int *a, int *b, int *c)
{
int d = 0xE0E0E0E0;

*a = *b = *c = d;
}

produces
_f1:
LFB0:
movl$-522133280, (%rdx)
movl$-522133280, (%rsi)
movl$-522133280, (%rdi)
ret

on x86-64 at -Os. It would save instruction space and probably not be any
slower to actually assign d to a register, but this is only done for 64-bit
constants.


-- 
   Summary: x86 constants could be unduplicated
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44073



[Bug target/44073] x86 constants could be unduplicated

2010-05-11 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2010-05-11 10:36 ---
It's propagated by vrp1, and then nothing removes it again. tree-uncprop
doesn't change it - it looks like it doesn't have anything to handle this,
actually.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44073



[Bug lto/44090] New: lto ice in verify_stmts

2010-05-11 Thread astrange at ithinksw dot com
> /usr/local/gcc46/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc46/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.3.1/4.6.0/lto-wrapper
Target: x86_64-apple-darwin10.3.1
Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46
--with-arch=native --with-tune=native --disable-nls --enable-lto
--disable-bootstrap LDFLAGS=-L/sw/lib CPPFLAGS=-I/sw/include
--enable-languages=c,c++,objc,obj-c++,lto
Thread model: posix
gcc version 4.6.0 20100511 (experimental) (GCC) 

The attached files have two different definitions of MpegEncContext. -flto with
checking gives an ice on it instead of a readable warning/error:

> /usr/local/gcc46/bin/gcc -O3 -flto -c h263dec.i
> /usr/local/gcc46/bin/gcc -O3 -flto -c ituh263dec.i
> echo "h263dec.o ituh263dec.o" > test
> /usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.3.1/4.6.0/lto1 -O3 @test   
>  
Reading object files: h263dec.o ituh263dec.o
Reading the callgraph
Merging declarations
Reading summaries
Reading function bodies: ff_h263_decode_mb ff_h263_decode_init
Performing interprocedural optimizations
 
In function 'ff_h263_decode_init':
lto1: error: type mismatch in address expression
 (*) (struct MpegEncContext *, [64]
*)

  (struct MpegEncContext *, [64] *)

# .MEM_5 = VDEF <.MEM_4(D)>
s_3->decode_mb = ff_h263_decode_mb;

lto1: internal compiler error: verify_stmts failed
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

It looks obviously invalid here, but building ffmpeg with -O3 -flto gives the
same ice, and I can't see any bugs that would cause that. It's hard to debug
it, though, since it doesn't print the origin files of the mismatched
definitions or anything.

The original, absolutely not unreduced version:
> svn co -r23100 svn://svn.mplayerhq.hu/ffmpeg/trunk ffmpeg
> cd ffmpeg
> ./configure --cc=/usr/local/gcc46/bin/gcc --extra-cflags="-flto -O3" 
> --extra-ldflags="-flto -O3" --enable-shared; make
...

...
s_4->decode_mb = ff_h263_decode_mb;

lto1: internal compiler error: verify_stmts failed


-- 
   Summary: lto ice in verify_stmts
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
     Component: lto
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-apple-darwin10.3.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090



[Bug lto/44090] lto ice in verify_stmts

2010-05-11 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-05-12 05:27 ---
Created an attachment (id=20638)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20638&action=view)
test file 1


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090



[Bug lto/44090] lto ice in verify_stmts

2010-05-11 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-05-12 05:27 ---
Created an attachment (id=20639)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20639&action=view)
test file 2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090



[Bug rtl-optimization/44223] New: segmentation fault with -g -fsched-pressure

2010-05-20 Thread astrange at ithinksw dot com
> gcc -O3 -g -fsched-pressure -fschedule-insns -S crash1m.i
crash1m.i: In function 'ff_adts_write_frame_header':
crash1m.i:35:2: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

Backtrace:
(gdb) run
Starting program:
/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.3.1/4.6.0/cc1 -fpreprocessed
crash1m.i -march=core2 -mcx16 -msahf -maes -mpclmul -mpopcnt -msse4.2 --param
l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072
-mtune=core2 -fPIC -feliminate-unused-debug-symbols -quiet -dumpbase crash1m.i
-mmacosx-version-min=10.6.3 -auxbase crash1m -g -O3 -version -fsched-pressure
-fschedule-insns -o crash1m.s
Reading symbols for shared libraries .++. done
GNU C (GCC) version 4.6.0 20100521 (experimental) (x86_64-apple-darwin10.3.1)
compiled by GNU C version 4.2.1 (Apple Inc. build 5659), GMP version
4.3.1, MPFR version 2.4.2-p3, MPC version 0.8
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.6.0 20100521 (experimental) (x86_64-apple-darwin10.3.1)
compiled by GNU C version 4.2.1 (Apple Inc. build 5659), GMP version
4.3.1, MPFR version 2.4.2-p3, MPC version 0.8
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 5c588719ada4c17718f398d6d2dbd7a3

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x
0x0001004edc54 in dying_use_p (use=0x141720070) at
../../../src/gcc/gcc/haifa-sched.c:769
769 if (NONDEBUG_INSN_P (next->insn)
(gdb) bt
#0  0x0001004edc54 in dying_use_p (use=0x141720070) at
../../../src/gcc/gcc/haifa-sched.c:769
#1  0x0001004f055d in setup_insn_reg_pressure_info [inlined] () at
/Users/astrange/Projects/src/gcc/gcc/haifa-sched.c:1130
#2  0x0001004f055d in ready_sort (ready=0x100b0b5e0) at
../../../src/gcc/gcc/haifa-sched.c:1502
#3  0x0001004f5e4b in schedule_block (target_bb=0x7fff5fbfe4e8) at
../../../src/gcc/gcc/haifa-sched.c:3203
#4  0x00010060c8bd in schedule_insns () at
../../../src/gcc/gcc/sched-rgn.c:3001
#5  0x00010060cd4f in rest_of_handle_sched () at
../../../src/gcc/gcc/sched-rgn.c:3512
#6  0x00010059cb3f in execute_one_pass (pass=0x100b99d40) at
../../../src/gcc/gcc/passes.c:1589
#7  0x00010059ce1d in execute_pass_list (pass=0x100b99d40) at
../../../src/gcc/gcc/passes.c:1644
#8  0x00010059ce2f in execute_pass_list (pass=0x100b98ec0) at
../../../src/gcc/gcc/passes.c:1645
#9  0x0001006cd1d0 in invoke_plugin_callbacks [inlined] () at
/Users/astrange/Projects/src/gcc/gcc/plugin.h:413
#10 0x0001006cd1d0 in tree_rest_of_compilation (fndecl=0x14252f300) at
../../../src/gcc/gcc/tree-optimize.c:416
#11 0x000100898ef6 in cgraph_expand_function (node=0x14240cd20) at
../../../src/gcc/gcc/cgraphunit.c:1622
#12 0x00010089c07d in cgraph_expand_all_functions [inlined] () at
/Users/astrange/Projects/src/gcc/gcc/cgraphunit.c:1701
#13 0x00010089c07d in cgraph_optimize () at
../../../src/gcc/gcc/cgraphunit.c:1957
#14 0x00010089c676 in cgraph_finalize_compilation_unit () at
../../../src/gcc/gcc/cgraphunit.c:1161
#15 0x0001f0f2 in c_write_global_declarations () at
../../../src/gcc/gcc/c-decl.c:9578
#16 0x0001006623c5 in do_compile () at ../../../src/gcc/gcc/toplev.c:1059
#17 0x000100662b1d in toplev_main (argc=32, argv=0x7fff5fbfe828) at
../../../src/gcc/gcc/toplev.c:2433
#18 0x00010f64 in start ()


-- 
   Summary: segmentation fault with -g -fsched-pressure
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-apple-darwin10.3.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44223



[Bug rtl-optimization/44223] segmentation fault with -g -fsched-pressure

2010-05-20 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-05-21 02:02 ---
Created an attachment (id=20715)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20715&action=view)
file


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44223



[Bug lto/44090] lto ice in verify_stmts

2010-05-24 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2010-05-24 20:01 ---
Fixed itself. Though lto still doesn't build ffmpeg, it's just a different bug
now.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090



  1   2   >