[Bug inline-asm/28686] ebp from clobber list used as operand
--- Comment #3 from michael dot meissner at amd dot com 2007-01-30 20:17 --- Created an attachment (id=12982) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12982&action=view) Secondary error Note, this is 32-bit only. If you compile epb2.c with -fpic -m32 and no optimization, it generates incorrect code, in that at -O0 it does not omit the frame pointer, but the asm is claimed to clobber %ebp, and subsequent local variables will use %ebp. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28686
[Bug target/30685] New: Move ASM_OUTPUT_* macros to gcc_target structure
Move the ASM_OUTPUT_* macros used in varasm.c to the target hooks structure, and eventually eliminate the ASM_OUTPUT_* macros. -- Summary: Move ASM_OUTPUT_* macros to gcc_target structure Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: michael dot meissner at amd dot com GCC build triplet: x86_64-redhat-linux GCC host triplet: x86_64-redhat-linux GCC target triplet: x86_64-redhat-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30685
[Bug target/29775] redundant movzbl
--- Comment #1 from michael dot meissner at amd dot com 2007-02-03 04:49 --- If you look at the RTL, in the if statement, the RTL loads the QI value into the register and does the test against the QI value, and the movzbl is how the load is done. The second movzbl is to zero extend the value into a SI value that can be used in the __builtin_ctz function. In addition, there is a spurious move at the end to move the value from %edx into %eax for the return. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29775
[Bug driver/30728] New: Building a 32-bit compiler on a 64-bit system should pass --32 flag to the assembler
If you configure to build a 32-bit compiler on a 64-bit Linux system with: CC='gcc -m32' /src/trunk/configure --{target,host,build}=i686-pc-linux-gnu ... the compiler fails because it defaults to 32-bit code but the standard assembler is 64 bit, and it fails in building libgcc. If you are building in such an environment, the compiler should be modified to pass --32 to the assembler. Note, there is the work around of putting a 32-bit assembler in the --prefix directory so that it builds correctly, but it would be nice to have it fixed. -- Summary: Building a 32-bit compiler on a 64-bit system should pass --32 flag to the assembler Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: michael dot meissner at amd dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: x86_64-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30728
[Bug target/31018] New: TARGET_{K8,K6,GENERIC} refered to in i386.md file
There are several instances of checking for a specific machine such as TARGET_K8 in the i386.md file. These should be changed to use feature macros that test for the appropriate processor bits in the x86_* variables. Assign this to me, as I'm working on a patch. -- Summary: TARGET_{K8,K6,GENERIC} refered to in i386.md file Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: michael dot meissner at amd dot com GCC build triplet: x86_64-pc-linux-gnu GCC host triplet: x86_64-pc-linux-gnu GCC target triplet: x86_64-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31018
[Bug target/31019] New: Microoptimization of the i386 and x86_64 compilers
There are a lot of feature test macros in the i386/x86_64 compiler of the form: (x86_some_var & (1 << ix86_arch)) These tests could be micro-optimized, either by storing 1 << ix86_arch into a global variable, or by having a global variable that is the result of the and and the shift, so that a simple != 0 can be done. -- Summary: Microoptimization of the i386 and x86_64 compilers Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: michael dot meissner at amd dot com GCC build triplet: x86_64-pc-gnu-linux GCC host triplet: x86_64-pc-gnu-linux GCC target triplet: x86_64-pc-gnu-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31019
[Bug target/31028] New: Microoptimization of the i386 and x86_64 compilers
There are a lot of feature test macros in the i386/x86_64 compiler of the form: (x86_some_var & (1 << ix86_arch)) These tests could be micro-optimized, either by storing 1 << ix86_arch into a global variable, or by having a global variable that is the result of the and and the shift, so that a simple != 0 can be done. -- Summary: Microoptimization of the i386 and x86_64 compilers Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: michael dot meissner at amd dot com GCC build triplet: x86_64-pc-gnu-linux GCC host triplet: x86_64-pc-gnu-linux GCC target triplet: x86_64-pc-gnu-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31028
[Bug target/31018] TARGET_{K8,K6,GENERIC} refered to in i386.md file
--- Comment #2 from michael dot meissner at amd dot com 2007-03-14 20:59 --- Patch committed: http://gcc.gnu.org/ml/gcc-patches/2007-03/msg00951.html -- michael dot meissner at amd dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31018
[Bug c++/31307] New: Interaction between x86_64 builtin function and inline functions causes poor code
If you compile the attached code with optimization on a 4.1.x system it will generate a store into a stack temporary in the middle of the loop that is never used. If you compile the code with -DUSE_MACRO where it uses macros instead of inline functions, it will generate the correct code without the extra store. It is still a bug in the 4.3 mainline with a compiler built on March 30th. -- Summary: Interaction between x86_64 builtin function and inline functions causes poor code Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: michael dot meissner at amd dot com GCC build triplet: x86_64-redhat-linux GCC host triplet: x86_64-redhat-linux GCC target triplet: x86_64-redhat-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307
[Bug c++/31307] Interaction between x86_64 builtin function and inline functions causes poor code
--- Comment #1 from michael dot meissner at amd dot com 2007-03-22 00:38 --- Created an attachment (id=13248) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13248&action=view) C++ source that shows the bug This is the source that shows the bug. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307
[Bug c++/31307] Interaction between x86_64 builtin function and inline functions causes poor code
--- Comment #2 from michael dot meissner at amd dot com 2007-03-22 00:39 --- Created an attachment (id=13249) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13249&action=view) This is the assembly language with the extra store in it -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307
[Bug c++/31307] Interaction between x86_64 builtin function and inline functions causes poor code
--- Comment #3 from michael dot meissner at amd dot com 2007-03-22 00:40 --- Created an attachment (id=13250) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13250&action=view) This is the good source compiled with -DUSE_MACRO -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307
[Bug middle-end/31307] Interaction between x86_64 builtin function and inline functions causes poor code
--- Comment #13 from michael dot meissner at amd dot com 2007-04-12 20:18 --- How hard would it be to back port the change to 4.1.3 and 4.2? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307
[Bug target/33524] New: SSE5 vectorized SI->DI conversions broken
If you use -O2 -ftree-vectorize -msse5 (or now -O3 -msse5), the compiler generates an insn not found message, because there is a typo in i386.c. -- Summary: SSE5 vectorized SI->DI conversions broken Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: michael dot meissner at amd dot com GCC build triplet: x86_64-pc-gnu-linux GCC host triplet: x86_64-pc-gnu-linux GCC target triplet: x86_64-pc-gnu-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33524
[Bug target/33524] SSE5 vectorized SI->DI conversions broken
--- Comment #1 from michael dot meissner at amd dot com 2007-09-21 20:50 --- Created an attachment (id=14241) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14241&action=view) Patch to fix problem -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33524
[Bug target/33524] SSE5 vectorized SI->DI conversions broken
--- Comment #2 from michael dot meissner at amd dot com 2007-09-21 20:51 --- Created an attachment (id=14242) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14242&action=view) Test case that replicates the file -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33524
[Bug middle-end/35004] Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers
--- Comment #2 from michael dot meissner at amd dot com 2008-01-29 00:10 --- Created an attachment (id=15041) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15041&action=view) Traceback for 35005 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004
[Bug c++/35004] New: Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers
If I add 4 more tree codes to tree.def, it causes a segmentation violation in building libstdc++ pre-compiled header files. Here is the patch to add 4 more tree codes: --- gcc/tree.def.~1~2007-11-01 11:59:47.0 -0400 +++ gcc/tree.def2008-01-28 16:01:36.0 -0500 @@ -682,6 +682,13 @@ DEFTREECODE (RSHIFT_EXPR, "rshift_expr", DEFTREECODE (LROTATE_EXPR, "lrotate_expr", tcc_binary, 2) DEFTREECODE (RROTATE_EXPR, "rrotate_expr", tcc_binary, 2) +/* Vector/vector shifts, where both arguments are vector types. This is only + used during the expansion of shifts and rotates. */ +DEFTREECODE (VLSHIFT_EXPR, "vlshift_expr", tcc_binary, 2) +DEFTREECODE (VRSHIFT_EXPR, "vrshift_expr", tcc_binary, 2) +DEFTREECODE (VLROTATE_EXPR, "vlrotate_expr", tcc_binary, 2) +DEFTREECODE (VRROTATE_EXPR, "vrrotate_expr", tcc_binary, 2) + /* Bitwise operations. Operands have same mode as result. */ DEFTREECODE (BIT_IOR_EXPR, "bit_ior_expr", tcc_binary, 2) DEFTREECODE (BIT_XOR_EXPR, "bit_xor_expr", tcc_binary, 2) Here is the file that segfaults: /data/fsf-build/bulldozer-gcc-test/./gcc/xgcc -shared-libgcc -B/data/fsf-build/bulldozer-gcc-test/./gcc -nostdinc++ -L/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/src -L/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -B/proj/gcc/fsf-install/bulldozer-gcc-test/x86_64-unknown-linux-gnu/bin/ -B/proj/gcc/fsf-install/bulldozer-gcc-test/x86_64-unknown-linux-gnu/lib/ -isystem /proj/gcc/fsf-install/bulldozer-gcc-test/x86_64-unknown-linux-gnu/include -isystem /proj/gcc/fsf-install/bulldozer-gcc-test/x86_64-unknown-linux-gnu/sys-include -Winvalid-pch -x c++-header -g -O2 -D_GNU_SOURCE -I/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include -I/proj/gcc/fsf-src/bulldozer-gcc-test/libstdc++-v3/libsupc++ -O0 -g /proj/gcc/fsf-src/bulldozer-gcc-test/libstdc++-v3/include/precompiled/stdc++.h -o x86_64-unknown-linux-gnu/bits/stdc++.h.gch/O0g.gch In file included from /data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include/valarray:539, from /proj/gcc/fsf-src/bulldozer-gcc-test/libstdc++-v3/include/precompiled/stdc++.h:96: /data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include/valarray: In instantiation of std::valarray: /data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/valarray_after.h:59: instantiated from here /data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include/valarray:117: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. -- Summary: Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: michael dot meissner at amd dot com GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004
[Bug c++/35004] Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers
--- Comment #1 from michael dot meissner at amd dot com 2008-01-29 00:04 --- Created an attachment (id=15040) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15040&action=view) Preprocessed file from the build of the libstdc++ pre-compiled headers File is bzip2'ed -9. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004
[Bug c++/35004] Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers
--- Comment #4 from michael dot meissner at amd dot com 2008-01-29 00:39 --- Created an attachment (id=15043) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15043&action=view) Proposed patch to fix the problem The problem is cp/cp-tree.h stores the tree_code in 8 bits, but the tree code now overflows. The patch expands the tree code to 16 bits, and removes 8 unused bits to keep the padding the same. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004
[Bug c++/35004] Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers
--- Comment #6 from michael dot meissner at amd dot com 2008-02-07 17:22 --- Subject: RE: Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers The problem is there are two different vector shifts. There is vector shift by a scalar amount (each element gets shifted the same amount), and vector shift by a vector (each element gets shifted by the corresponding element in the vector). Right now, GCC in tree-vect-transform.c looks at the shift optab and sees if the second operand is a scalar mode, it believes the machine only supports the vector shift by scalar mode, and assumes that if the type is vector mode, that the machine supports vector/vector shifts. The SSE2 instruction set extension on the x86 has vector/scalar shift instructions, and the SSE5 instruction set extension adds vector/vector shifts and rotates. I want to be able to add support for a machine that has both types of vector shift, but with the current framework, this was impossible. -- Michael Meissner AMD, MS 83-29 90 Central Street Boxborough, MA 01719 > -Original Message- > From: bonzini at gnu dot org [mailto:[EMAIL PROTECTED] > Sent: Thursday, February 07, 2008 12:11 PM > To: Meissner, Michael > Subject: [Bug c++/35004] Adding 4 more tree codes causes a crash in > building libstdc++ pre-compiled headers > > > > --- Comment #5 from bonzini at gnu dot org 2008-02-07 17:10 --- > Unrelated, but why couldn''t vector/vector shifts/rotates overload > LSHIFT_EXPR > instead? :-) > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004 > > --- You are receiving this mail because: --- > You reported the bug, or are watching the reporter. > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004
[Bug target/35189] -mno-sse4.2 turns off SSE4a
--- Comment #2 from michael dot meissner at amd dot com 2008-02-13 23:55 --- Umm, SSE4A is completely different from SSE4/SSE4.1/SSE4.2. SSE4A are the instructions added with AMD's Barcelona machine, while SSE4.1 is the instructions added with the current generation of Intel machines (Penryn if memory serves), and SSE4.2 will be the instructions in the next Intel release. The whole naming scheme is unfortunate, especially SSSE3 and SSE4A. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35189
[Bug target/35189] -mno-sse4.2 turns off SSE4a
--- Comment #4 from michael dot meissner at amd dot com 2008-02-14 00:20 --- In terms of shipping systems, no AMD system supports SSSE3 right now. As I understand it, the SSSE3 instructions were inbetween SSE3 and SSE4.1 on Intel systems, so -mno-sse3 should turn off SSSE3, but -mno-sse4a should not turn off SSSE3. Current shipping AMD systems do support SSE3. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35189
[Bug middle-end/17886] variable rotate and long long rotate should be better optimized
--- Comment #14 from michael dot meissner at amd dot com 2005-10-04 18:59 --- Created an attachment (id=9876) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9876&action=view) Patch for x86 double word shifts This patch fixes the bug from the x86 side of things instead of from the machine independent, by adding direct expanders for the best code (for doing 64 bit rotates in 32-bit mode and 128 bit rotates in 64-bit mode). On a machine with conditional move (all recent processors), the code becomes: movl%edx, %ebx shldl %eax, %edx shldl %ebx, %eax movl%edx, %ebx andl$32, %ecx cmovne %eax, %edx cmovne %ebx, %eax However, I suspect using MMX or SSE2 instructions will provide even more of a speedup, since there are direct 64-bit shifts, and, or, load/store directly (but no direct rotate). In the MMX space you have to be careful not to have active floating point going on, and to switch out of MMX mode before doing calls or returns. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886
[Bug middle-end/17886] variable rotate and long long rotate should be better optimized
--- Comment #15 from michael dot meissner at amd dot com 2005-10-04 19:51 --- Note, Mark's patch as applied to the tree has a minor typo in it. The rotrdi3 define_expand uses (rotate:DI ...) instead of (rotatert:DI ...). It doesn't matter in practice, since the generator function is never called, but it is useful to have the right insns listed. -- michael dot meissner at amd dot com changed: What|Removed |Added CC| |michael dot meissner at amd | |dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886
[Bug middle-end/17886] variable rotate and long long rotate should be better optimized
--- Comment #16 from michael dot meissner at amd dot com 2005-10-04 20:06 --- Created an attachment (id=9880) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9880&action=view) Respin of 17886 patch to match new tree contents This patch is meant to apply on top of Mark's changes, but provides the same code as my previous patch. -- michael dot meissner at amd dot com changed: What|Removed |Added Attachment #9876 is|0 |1 obsolete|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886
[Bug middle-end/17886] variable rotate and long long rotate should be better optimized
--- Comment #18 from michael dot meissner at amd dot com 2005-10-04 20:32 --- Subject: RE: variable rotate and long long rotate should be better optimized Yep, all valid points. So I don't think it should be done by default. But I suspect the original poster's application may be well behaved to be able to use it. Certainly if the only reason for doing long long is to do heavy duty bit banging (shift/rotate/and/or/test), but no arithmetic it would speed up since it could do one instruction instead of multiple, and it would lesson the register pressure that long longs put on the x86. -Original Message- From: ak at muc dot de [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 04, 2005 4:20 PM To: Meissner, Michael Subject: [Bug middle-end/17886] variable rotate and long long rotate should be better optimized --- Comment #17 from ak at muc dot de 2005-10-04 20:20 --- The code now looks fine to me thanks I would prefer if it didn't generate SSE2/MMX code because that would be a problem for kernels. Also in many x86 implementations moving things between normal integer registers and SIMD registers is quite slow and would likely eat all advantages -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886
[Bug middle-end/17886] variable rotate and long long rotate should be better optimized
--- Comment #19 from michael dot meissner at amd dot com 2005-10-04 20:35 --- Subject: RE: variable rotate and long long rotate should be better optimized I almost forgot, kernels should be using -mno-mmx and -mno-sse as a matter of course (or -msoft-float). I first ran into this problem in 1990 when I was supporting the MIPS platform, and the kernel guys were surprised that the compiler would use the double precision registers to do block copies, since it could double the bandwidth of doing 32-bit moves. -Original Message- From: ak at muc dot de [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 04, 2005 4:20 PM To: Meissner, Michael Subject: [Bug middle-end/17886] variable rotate and long long rotate should be better optimized --- Comment #17 from ak at muc dot de 2005-10-04 20:20 --- The code now looks fine to me thanks I would prefer if it didn't generate SSE2/MMX code because that would be a problem for kernels. Also in many x86 implementations moving things between normal integer registers and SIMD registers is quite slow and would likely eat all advantages -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886
[Bug middle-end/17886] variable rotate and long long rotate should be better optimized
--- Comment #21 from michael dot meissner at amd dot com 2005-10-04 20:46 --- Subject: RE: variable rotate and long long rotate should be better optimized Sorry, I got mixed up as to who the original poster was. SSE2 is harder to use because it deals with 128 bit items instead of 64 bit (unless you are in 64-bit and working on TImode values). Ultimately, it is a matter whether it is important enough for somebody to spend a week or two of work to use the multimedia instructions for this case. I suspect in most cases, it might be better to isolate the code and use #ifdef's and builtin functions/asm's. -Original Message- From: ak at muc dot de [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 04, 2005 4:40 PM To: Meissner, Michael Subject: [Bug middle-end/17886] variable rotate and long long rotate should be better optimized --- Comment #20 from ak at muc dot de 2005-10-04 20:39 --- Newer linux does that of course, although not always in older releases. But even in user space it's not a good idea to use SSE2 unless you really need it because it increases the cost of the context switch and costs an exception each time first in a timeslice. P.S.: I was the original poster, but the application wasn't a kernel but I doubt it's a good idea to use SSE2. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886
[Bug rtl-optimization/23812] swapping DImode halves produces poor x86 register allocation
--- Comment #3 from michael dot meissner at amd dot com 2005-10-18 17:44 --- Note, since this is a rotate, the patches I proposed in 17886 will generate much better code for this one case (basically mov/mov/xchgl -- it could be improved by a peephole to do the moves directly instead of xchgl). However, the more general subreg problem needs to be looked at. -- michael dot meissner at amd dot com changed: What|Removed |Added CC||michael dot meissner at amd ||dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23812
[Bug target/34077] New: GCC -O1 -minline-all-stringops -minline-stringops-dynamically fails for spec 2006 bzip2, gobmk, and h264ref benchmarks
I was building SPEC 2006 with the options: -minline-all-stringops -minline-stringops-dynamically in addition to my normal options. If you use both options together, GCC generates the following error: foo.c: In function spec_random_load: foo.c:24: internal compiler error: in int_mode_for_mode, at stor-layout.c:258 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. Tracing it down, emit_cmp_and_jump_insns is called to compare and jump with two constant values: Breakpoint 3, emit_cmp_and_jump_insns (x=0x2dff3c50, y=0x2dff3480, comparison=LTU, size=0x0, mode=SImode, unsignedp=1, label=0x2e134fa0) at /proj/gcc/fsf-src/trunk/gcc/optabs.c:4428 (gdb) print x $7 = (rtx) 0x2dff3c50 (gdb) pr (const_int 131072 [0x2]) (gdb) print y $8 = (rtx) 0x2dff3480 (gdb) pr (const_int 8 [0x8]) (gdb) up #1 0x008adab6 in ix86_expand_movmem (dst=0x2e136a60, src=0x2e136a80, count_exp=0x2dff3c50, align_exp=, expected_align_exp=, expected_size_exp=) at /proj/gcc/fsf-src/trunk/gcc/config/i386/i386.c:15362 The failure comes because integer constants have VOIDmode type, rather than an integer type. Either emit_cmp_and_jump_insns should handle the constant/constant case, or ix86_expand_movemem should not call emit_cmp_and_jump_insns with constant tests. -- Summary: GCC -O1 -minline-all-stringops -minline-stringops- dynamically fails for spec 2006 bzip2, gobmk, and h264ref benchmarks Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: michael dot meissner at amd dot com GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34077
[Bug target/34077] GCC -O1 -minline-all-stringops -minline-stringops-dynamically fails for spec 2006 bzip2, gobmk, and h264ref benchmarks
--- Comment #1 from michael dot meissner at amd dot com 2007-11-12 20:38 --- Created an attachment (id=14533) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14533&action=view) Reduced testcase for bug 34077 from 401.bzip2 This is the reduced testcase from 401.bzip2. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34077
[Bug target/34077] GCC -O1 -minline-all-stringops -minline-stringops-dynamically fails for spec 2006 bzip2, gobmk, and h264ref benchmarks
--- Comment #3 from michael dot meissner at amd dot com 2007-11-13 20:48 --- Created an attachment (id=14548) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14548&action=view) Patch to fix PR34077 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34077
[Bug target/25295] unused register saved in function prolog
--- Comment #3 from michael dot meissner at amd dot com 2006-12-04 23:21 --- I've done some analysis on the test case. The current GCC 4.2 and mainline branches no longer generate the initial push of %r8, but instead do a subq $8,%rsp. I believe in the compiler you used it did the push to allocate 8 bytes of stack instead of the subtract. Note, the epilogue still uses a pop to remove the stack location. The core of the problem is that the compiler is allocating 8 bytes too much stack in this particular case. I think I understand whats going on, but I want to dig a bit more. -- michael dot meissner at amd dot com changed: What|Removed |Added CC||michael dot meissner at amd | |dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25295