[Bug treelang/47197] New: ICE in gimplify_expr, at gimplify.c:7153 on AltiVec code (vec_dst)

2011-01-06 Thread kaffeemonster at googlemail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47197

   Summary: ICE in gimplify_expr, at gimplify.c:7153 on AltiVec
code (vec_dst)
   Product: gcc
   Version: 4.5.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: treelang
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kaffeemons...@googlemail.com


$ cat alti_crash.c 
#include 

void func(unsigned char *buf, unsigned len)
{
vec_dst(buf, (len >= 256 ? 0 : len) | 512, 2);
}

$ powerpc-linux-gnu-gcc -maltivec -c alti_crash.c 
alti_crash.c: In function 'func':
alti_crash.c:5:20: internal compiler error: in gimplify_expr, at
gimplify.c:7153
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

$ powerpc-linux-gnu-gcc --version -v
Using built-in specs.
COLLECT_GCC=/usr/i686-pc-linux-gnu/powerpc-linux-gnu/gcc-bin/4.5.2/powerpc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/powerpc-linux-gnu/4.5.2/lto-wrapper
powerpc-linux-gnu-gcc (Gentoo 4.5.2 p1.0, pie-0.4.5) 4.5.2
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Target: powerpc-linux-gnu
Configured with:
/var/tmp/portage/cross-powerpc-linux-gnu/gcc-4.5.2/work/gcc-4.5.2/configure
--prefix=/usr --bindir=/usr/i686-pc-linux-gnu/powerpc-linux-gnu/gcc-bin/4.5.2
--includedir=/usr/lib/gcc/powerpc-linux-gnu/4.5.2/include
--datadir=/usr/share/gcc-data/powerpc-linux-gnu/4.5.2
--mandir=/usr/share/gcc-data/powerpc-linux-gnu/4.5.2/man
--infodir=/usr/share/gcc-data/powerpc-linux-gnu/4.5.2/info
--with-gxx-include-dir=/usr/lib/gcc/powerpc-linux-gnu/4.5.2/include/g++-v4
--host=i686-pc-linux-gnu --target=powerpc-linux-gnu --build=i686-pc-linux-gnu
--disable-altivec --disable-fixed-point --without-ppl --without-cloog
--disable-ppl-version-check --disable-lto --enable-nls
--without-included-gettext --with-system-zlib --disable-werror
--enable-secureplt --disable-multilib --enable-libmudflap --disable-libssp
--enable-libgomp --enable-cld
--with-python-dir=/share/gcc-data/powerpc-linux-gnu/4.5.2/python
--enable-checking=release --disable-libgcj --enable-languages=c
--with-sysroot=/usr/powerpc-linux-gnu --disable-bootstrap --enable-__cxa_atexit
--enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/
--with-pkgversion='Gentoo 4.5.2 p1.0, pie-0.4.5'
Thread model: posix
gcc version 4.5.2 (Gentoo 4.5.2 p1.0, pie-0.4.5) 
COLLECT_GCC_OPTIONS='-fversion' '-v'
 /usr/libexec/gcc/powerpc-linux-gnu/4.5.2/cc1 -quiet -v -D__unix__
-D__gnu_linux__ -D__linux__ -Dunix -D__unix -Dlinux -D__linux -Asystem=linux
-Asystem=unix -Asystem=posix help-dummy -D_FORTIFY_SOURCE=2 -msecure-plt -quiet
-dumpbase help-dummy -auxbase help-dummy -version -fversion -o /tmp/ccujbTX3.s
GNU C (Gentoo 4.5.2 p1.0, pie-0.4.5) version 4.5.2 (powerpc-linux-gnu)
compiled by GNU C version 4.4.4, GMP version 4.3.2, MPFR version
2.4.2-p3, MPC version 0.8.2
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
COLLECT_GCC_OPTIONS='-fversion' '-v'
 /usr/libexec/gcc/powerpc-linux-gnu/as -mppc -many -V -Qy --version -o
/tmp/cccIfFZe.o /tmp/ccujbTX3.s
GNU assembler version 2.21 (powerpc-linux-gnu) using BFD version (GNU Binutils)
2.21
GNU assembler (GNU Binutils) 2.21
Copyright 2010 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `powerpc-linux-gnu'.
COMPILER_PATH=/usr/libexec/gcc/powerpc-linux-gnu/4.5.2/:/usr/libexec/gcc/powerpc-linux-gnu/4.5.2/:/usr/libexec/gcc/powerpc-linux-gnu/:/usr/lib/gcc/powerpc-linux-gnu/4.5.2/:/usr/lib/gcc/powerpc-linux-gnu/:/usr/lib/gcc/powerpc-linux-gnu/4.5.2/../../../../powerpc-linux-gnu/bin/
LIBRARY_PATH=/usr/lib/gcc/powerpc-linux-gnu/4.5.2/:/usr/lib/gcc/powerpc-linux-gnu/4.5.2/../../../../powerpc-linux-gnu/lib/:/usr/powerpc-linux-gnu/lib/:/usr/powerpc-linux-gnu/usr/lib/
COLLECT_GCC_OPTIONS='-fversion' '-v'
 /usr/libexec/gcc/powerpc-linux-gnu/4.5.2/collect2
--sysroot=/usr/powerpc-linux-gnu --eh-frame-hdr -V -Qy -m elf32ppclinux
-dynamic-linker /lib/ld.so.1 --version
/usr/lib/gcc/powerpc-linux-gnu/4.5.2/../../../../powerpc-linux-gnu/lib/crt1.o
/usr/lib/gcc/powerpc-linux-gnu/4.5.2/../../../../powerpc-linux-gnu/lib/crti.o
/usr/lib/gcc/powerpc-linux-gnu/4.5.2/crtbegin.o
-L/usr/lib/gcc/powerpc-linux-gnu/4.5.2
-L/usr/lib/gcc/powerpc-linux-gnu/4.5.2/../../../../powerpc-linux-gnu/lib
-L/usr/powerpc-linux-gnu/lib -L/usr/powerpc-linux-gnu/usr/lib /tmp/cccIfFZe.o
-lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s
--no-as-needed /usr/lib/gcc/powerpc-linux-gnu/4.5.2/crtend.o
/usr/lib/gcc/powerpc-linux-gnu/4.5.2/../../../../powerpc-linux-gnu/lib/crtn.o
GNU ld (GNU Binutils) 2.21
  Suppor

[Bug target/52323] New: i386: gcse runs amok with pic-addresses

2012-02-21 Thread kaffeemonster at googlemail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52323

 Bug #: 52323
   Summary: i386: gcse runs amok with pic-addresses
Classification: Unclassified
   Product: gcc
   Version: 4.6.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kaffeemons...@googlemail.com


I have here some very bad interaction between gcse and pic addresses on i386.

The attached testcase (yes, i know, it's not a beauty, could prop. be reduces
some more) compiled by:
gcc-4.6.2 -Wall -O1 -fpic -S gcse_amok.c -o gcse_amok.s
creates roughly this code:
to_base32_BMI2:
.LFB1:
.cfi_startproc
pushl%ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
pushl%edi
.cfi_def_cfa_offset 12
.cfi_offset 7, -12
pushl%esi
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
pushl%ebx
.cfi_def_cfa_offset 20
.cfi_offset 3, -20
call__i686.get_pc_thunk.bx
addl$_GLOBAL_OFFSET_TABLE_, %ebx
movl20(%esp), %eax
movl24(%esp), %edx
movl28(%esp), %ecx
cmpl$4, %ecx
jbe.L2
.L4:
movl(%edx), %esi
bswap%esi
movl%esi, %edi
shrl$12, %edi
#APP
# 19 "gcse_amok.c" 1
pdep 64+vals@GOTOFF(%ebx), %edi, %edi
# 0 "" 2
#NO_APP
movzbl4(%edx), %ebp
sall$8, %esi
orl%ebp, %esi
#APP
# 20 "gcse_amok.c" 1
pdep 64+vals@GOTOFF(%ebx), %esi, %esi
# 0 "" 2
#NO_APP
bswap%edi
bswap%esi
#APP
# 25 "gcse_amok.c" 1
movd%edi, %xmm0
pinsrd$1, %esi, %xmm0
paddb80+vals@GOTOFF(%ebx), %xmm0
movdqa%xmm0, %xmm1
pcmpgtb96+vals@GOTOFF(%ebx), %xmm1
pand112+vals@GOTOFF(%ebx), %xmm1
psubb%xmm1, %xmm0
movq%xmm0, (%eax)
# 0 "" 2
#NO_APP
addl$8, %eax
addl$5, %edx
subl$5, %ecx
cmpl$4, %ecx
ja.L4
...

If -fgcse (like in -O2) gets added to the command line, things get ugly:
gcc-4.6.2 -Wall -O1 -fpic -fgcse -S gcse_amok.c -o gcse_amok.s
results in:
to_base32_BMI2:
.LFB1:
.cfi_startproc
pushl%ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
pushl%edi
.cfi_def_cfa_offset 12
.cfi_offset 7, -12
pushl%esi
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
pushl%ebx
.cfi_def_cfa_offset 20
.cfi_offset 3, -20
subl$36, %esp
.cfi_def_cfa_offset 56
call__i686.get_pc_thunk.bx
addl$_GLOBAL_OFFSET_TABLE_, %ebx
movl56(%esp), %eax
movl60(%esp), %edx
movl64(%esp), %ecx
cmpl$4, %ecx
jbe.L2
leal64+vals@GOTOFF, %edi
leal80+vals@GOTOFF, %esi
movl%esi, 32(%esp)
leal96+vals@GOTOFF, %ebp
movl%ebp, 28(%esp)
leal112+vals@GOTOFF, %esi
movl%esi, 24(%esp)
movl%eax, 8(%esp)
movl%edx, (%esp)
movl%ecx, 4(%esp)
movl%edi, 12(%esp)
.L3:
movl(%esp), %edi
movl(%edi), %esi
bswap%esi
movl%esi, %edi
shrl$12, %edi
movl12(%esp), %eax
#APP
# 19 "gcse_amok.c" 1
pdep (%eax,%ebx), %edi, %edi
# 0 "" 2
#NO_APP
movl(%esp), %edx
movzbl4(%edx), %eax
sall$8, %esi
orl%eax, %esi
movl12(%esp), %ecx
#APP
# 20 "gcse_amok.c" 1
pdep (%ecx,%ebx), %esi, %esi
# 0 "" 2
#NO_APP
bswap%edi
bswap%esi
movl8(%esp), %eax
movl32(%esp), %edx
movl28(%esp), %ecx
movl24(%esp), %ebp
#APP
# 25 "gcse_amok.c" 1
movd%edi, %xmm0
pinsrd$1, %esi, %xmm0
paddb(%edx,%ebx), %xmm0
movdqa%xmm0, %xmm1
pcmpgtb(%ecx,%ebx), %xmm1
pand0(%ebp,%ebx), %xmm1
psubb%xmm1, %xmm0
movq%xmm0, (%eax)
# 0 "" 2
#NO_APP
addl$8, %eax
movl%eax, 8(%esp)
addl$5, (%esp)
subl$5, 4(%esp)
cmpl$4, 4(%esp)
ja.L3
...

Later passes (-O2, -O3) only make things worse or can not recover from this.
For some reason gcse tries to hoist the constant address-offsets out the loop.
this needs a bunch of register, which i386 does not have, and so the spilling
begins...

GCC 4.5.3 does not even compile it
$ gcc-4.5.3 -Wall -O1 -fpic -fgcse -S -o gcse_amok.s gcse_amok.c
gcse_amok.c: In function 'to_base32_BMI2':  
gcse_amok.c:25:2: error: can't find a register in class 'GENERAL_REGS' while
reloading 'asm'   
gcse_amok.c:19:2: error: 'asm' operand has impossible constraints   
gcse_amok.c:20:2: error: 'asm' operand has impossible constraints   
gcse_amok.c:25:2: error: 'asm' operand has impossible constraints

without gcse, no problem:
$ gcc -Wall -O1 -fpic -S -o gcse_amok.s gcse_amok.c
$ echo $?
0

I do not really know which component this is, but i guess it is a target
problem, the low costs for "more complicated addressing modes" on x86 (so the
compil

[Bug target/52323] i386: gcse runs amok with pic-addresses

2012-02-21 Thread kaffeemonster at googlemail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52323

--- Comment #1 from Jan Seiffert  
2012-02-21 08:15:19 UTC ---
Created attachment 26709
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26709
Testcase exposing gcse hyperactivity with pic on i386

the testcase

*mumbel mumbel*
that the description field is mandatory could be made more visible...


[Bug target/52323] i386: gcse runs amok with pic-addresses

2012-02-21 Thread kaffeemonster at googlemail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52323

--- Comment #3 from Jan Seiffert  
2012-02-22 00:03:53 UTC ---
My use case are not large floating point math funcs.

While intrinsics are nice (the new Tile ports rock! Every spec. instruction as
intrinsic from day 1, that's how it should be for a VLIW arch, can't say the
same for IA64...), for x86 they are generally not on my radar.
x86 being the primary architecture today for my use case, if things are only
mediocre on PPC/ARM or whatever, i can live with it, but not for x86.
And on x86 intrinsics simply do not work good enough.

Intrinsics have one "deadly" limitation:
You can only use them if -march= is set right.
And x86 has one gazillion ISA extensions, every 2 years a new one.
This way you can not wright code which dynamically dispatches at runtime.
User which get code by some general build (Distro, package from website) are
the majority for x86. You can expect from an PPC user or ARM user or Alpha user
to set his march right, he will be compiling it by himself anyway, but not for
x86. Also distros "hate it" when you do not obey the CFLAGS they set during
build and muck around with these CFLAGS (for good reasons).

BMI2 is an ISA extention where the CPUs will only get released. If there are
intrinsics in GCC for it, it's prop. somewhere in CVS. GCCs in use out there
are generally not bleeding edge new.

Another problem is what GCC (but also other compiler) does when he is "one or
two" register short (very common on i386). Stackframe, foo, bar, often not the
perfect spill (to but it in a positive light). This is the point where a human
can either: a) find a better spill (more cold) b) squeeze and push and shove to
make it fit. But you can't flip a switch on a compiler: "please be 20% smarter
now".

And so why write it as inline asm instead of a stand alone .s? Let the compiler
do the leg work, that's his job. Calling conventions, debug info, sections,
.gnu-note-stack, pic/non-pic, etc. I only need to write the kernel and can do
as much or few as i like to, without wasting any thought on the ASM boiler
plate stuff.

I could write it with intrinsics for this testcase (but then the problem would
prop. vanish?), i generally do not for x86 and this is a snipped from a larger
code body. (the alpha version uses __builtin_zapnot and __builtin_cmpbge, the
NEON version uses the stuff from neon.h, but for other instruction sets, there
are not enough intrinsics or they work poorly (ex. ARM iWMMXt, MIPS
Loongson)...)


[Bug middle-end/47602] Permit inline asm to clobber PIC register

2011-12-24 Thread kaffeemonster at googlemail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47602

Jan Seiffert  changed:

   What|Removed |Added

 CC||kaffeemonster at googlemail
   ||dot com

--- Comment #15 from Jan Seiffert  
2011-12-24 15:54:23 UTC ---
As a heavy inline asm user myself, i can understand the pain to handle PIC
yourself, but there is no way around that.

You can "accidentally" use the PIC register by a memory operand ("m"). Then the
compiler can not save it for you.

Greetings
Jan