[Bug inline-asm/28686] ebp from clobber list used as operand

2007-01-30 Thread michael dot meissner at amd dot com


--- Comment #3 from michael dot meissner at amd dot com  2007-01-30 20:17 
---
Created an attachment (id=12982)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12982&action=view)
Secondary error

Note, this is 32-bit only.  If you compile epb2.c with -fpic -m32 and no
optimization, it generates incorrect code, in that at -O0 it does not omit the
frame pointer, but the asm is claimed to clobber %ebp, and subsequent local
variables will use %ebp.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28686



[Bug target/30685] New: Move ASM_OUTPUT_* macros to gcc_target structure

2007-02-02 Thread michael dot meissner at amd dot com
Move the ASM_OUTPUT_* macros used in varasm.c to the target hooks structure,
and eventually eliminate the ASM_OUTPUT_* macros.


-- 
   Summary: Move ASM_OUTPUT_* macros to gcc_target structure
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: michael dot meissner at amd dot com
 GCC build triplet: x86_64-redhat-linux
  GCC host triplet: x86_64-redhat-linux
GCC target triplet: x86_64-redhat-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30685



[Bug target/29775] redundant movzbl

2007-02-02 Thread michael dot meissner at amd dot com


--- Comment #1 from michael dot meissner at amd dot com  2007-02-03 04:49 
---
If you look at the RTL, in the if statement, the RTL loads the QI value into
the register and does the test against the QI value, and the movzbl is how the
load is done.  The second movzbl is to zero extend the value into a SI value
that can be used in the __builtin_ctz function.

In addition, there is a spurious move at the end to move the value from %edx
into %eax for the return.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29775



[Bug driver/30728] New: Building a 32-bit compiler on a 64-bit system should pass --32 flag to the assembler

2007-02-07 Thread michael dot meissner at amd dot com
If you configure to build a 32-bit compiler on a 64-bit Linux system with:
CC='gcc -m32' /src/trunk/configure --{target,host,build}=i686-pc-linux-gnu ...
the compiler fails because it defaults to 32-bit code but the standard
assembler is 64 bit, and it fails in building libgcc.  If you are building in
such an environment, the compiler should be modified to pass --32 to the
assembler.

Note, there is the work around of putting a 32-bit assembler in the --prefix
directory so that it builds correctly, but it would be nice to have it fixed.


-- 
   Summary: Building a 32-bit compiler on a 64-bit system should
pass --32 flag to the assembler
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: michael dot meissner at amd dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: x86_64-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30728



[Bug target/31018] New: TARGET_{K8,K6,GENERIC} refered to in i386.md file

2007-03-01 Thread michael dot meissner at amd dot com
There are several instances of checking for a specific machine such as
TARGET_K8 in the i386.md file.  These should be changed to use feature macros
that test for the appropriate processor bits in the x86_* variables.

Assign this to me, as I'm working on a patch.


-- 
   Summary: TARGET_{K8,K6,GENERIC} refered to in i386.md file
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: michael dot meissner at amd dot com
 GCC build triplet: x86_64-pc-linux-gnu
  GCC host triplet: x86_64-pc-linux-gnu
GCC target triplet: x86_64-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31018



[Bug target/31019] New: Microoptimization of the i386 and x86_64 compilers

2007-03-01 Thread michael dot meissner at amd dot com
There are a lot of feature test macros in the i386/x86_64 compiler of the form:
(x86_some_var & (1 << ix86_arch))

These tests could be micro-optimized, either by storing 1 << ix86_arch into a
global variable, or by having a global variable that is the result of the and
and the shift, so that a simple != 0 can be done.


-- 
   Summary: Microoptimization of the i386 and x86_64 compilers
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: michael dot meissner at amd dot com
 GCC build triplet: x86_64-pc-gnu-linux
  GCC host triplet: x86_64-pc-gnu-linux
GCC target triplet: x86_64-pc-gnu-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31019



[Bug target/31028] New: Microoptimization of the i386 and x86_64 compilers

2007-03-02 Thread michael dot meissner at amd dot com
There are a lot of feature test macros in the i386/x86_64 compiler of the form:
(x86_some_var & (1 << ix86_arch))

These tests could be micro-optimized, either by storing 1 << ix86_arch into a
global variable, or by having a global variable that is the result of the and
and the shift, so that a simple != 0 can be done.


-- 
   Summary: Microoptimization of the i386 and x86_64 compilers
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: michael dot meissner at amd dot com
 GCC build triplet: x86_64-pc-gnu-linux
  GCC host triplet: x86_64-pc-gnu-linux
GCC target triplet: x86_64-pc-gnu-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31028



[Bug target/31018] TARGET_{K8,K6,GENERIC} refered to in i386.md file

2007-03-14 Thread michael dot meissner at amd dot com


--- Comment #2 from michael dot meissner at amd dot com  2007-03-14 20:59 
---
Patch committed:
http://gcc.gnu.org/ml/gcc-patches/2007-03/msg00951.html


-- 

michael dot meissner at amd dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31018



[Bug c++/31307] New: Interaction between x86_64 builtin function and inline functions causes poor code

2007-03-21 Thread michael dot meissner at amd dot com
If you compile the attached code with optimization on a 4.1.x system it will
generate a store into a stack temporary in the middle of the loop that is never
used.  If you compile the code with -DUSE_MACRO where it uses macros instead of
inline functions, it will generate the correct code without the extra store. 
It is still a bug in the 4.3 mainline with a compiler built on March 30th.


-- 
   Summary: Interaction between x86_64 builtin function and inline
functions causes poor code
   Product: gcc
   Version: 4.1.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: michael dot meissner at amd dot com
 GCC build triplet: x86_64-redhat-linux
  GCC host triplet: x86_64-redhat-linux
GCC target triplet: x86_64-redhat-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307



[Bug c++/31307] Interaction between x86_64 builtin function and inline functions causes poor code

2007-03-21 Thread michael dot meissner at amd dot com


--- Comment #1 from michael dot meissner at amd dot com  2007-03-22 00:38 
---
Created an attachment (id=13248)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13248&action=view)
C++ source that shows the bug

This is the source that shows the bug.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307



[Bug c++/31307] Interaction between x86_64 builtin function and inline functions causes poor code

2007-03-21 Thread michael dot meissner at amd dot com


--- Comment #2 from michael dot meissner at amd dot com  2007-03-22 00:39 
---
Created an attachment (id=13249)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13249&action=view)
This is the assembly language with the extra store in it


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307



[Bug c++/31307] Interaction between x86_64 builtin function and inline functions causes poor code

2007-03-21 Thread michael dot meissner at amd dot com


--- Comment #3 from michael dot meissner at amd dot com  2007-03-22 00:40 
---
Created an attachment (id=13250)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13250&action=view)
This is the good source compiled with -DUSE_MACRO


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307



[Bug middle-end/31307] Interaction between x86_64 builtin function and inline functions causes poor code

2007-04-12 Thread michael dot meissner at amd dot com


--- Comment #13 from michael dot meissner at amd dot com  2007-04-12 20:18 
---
How hard would it be to back port the change to 4.1.3 and 4.2?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307



[Bug target/33524] New: SSE5 vectorized SI->DI conversions broken

2007-09-21 Thread michael dot meissner at amd dot com
If you use -O2 -ftree-vectorize -msse5 (or now -O3 -msse5), the compiler
generates an insn not found message, because there is a typo in i386.c.


-- 
   Summary: SSE5 vectorized SI->DI conversions broken
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: michael dot meissner at amd dot com
 GCC build triplet: x86_64-pc-gnu-linux
  GCC host triplet: x86_64-pc-gnu-linux
GCC target triplet: x86_64-pc-gnu-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33524



[Bug target/33524] SSE5 vectorized SI->DI conversions broken

2007-09-21 Thread michael dot meissner at amd dot com


--- Comment #1 from michael dot meissner at amd dot com  2007-09-21 20:50 
---
Created an attachment (id=14241)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14241&action=view)
Patch to fix problem


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33524



[Bug target/33524] SSE5 vectorized SI->DI conversions broken

2007-09-21 Thread michael dot meissner at amd dot com


--- Comment #2 from michael dot meissner at amd dot com  2007-09-21 20:51 
---
Created an attachment (id=14242)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14242&action=view)
Test case that replicates the file


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33524



[Bug middle-end/35004] Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers

2008-01-28 Thread michael dot meissner at amd dot com


--- Comment #2 from michael dot meissner at amd dot com  2008-01-29 00:10 
---
Created an attachment (id=15041)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15041&action=view)
Traceback for 35005


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004



[Bug c++/35004] New: Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers

2008-01-28 Thread michael dot meissner at amd dot com
If I add 4 more tree codes to tree.def, it causes a segmentation violation in
building libstdc++ pre-compiled header files.

Here is the patch to add 4 more tree codes:
--- gcc/tree.def.~1~2007-11-01 11:59:47.0 -0400
+++ gcc/tree.def2008-01-28 16:01:36.0 -0500
@@ -682,6 +682,13 @@ DEFTREECODE (RSHIFT_EXPR, "rshift_expr",
 DEFTREECODE (LROTATE_EXPR, "lrotate_expr", tcc_binary, 2)
 DEFTREECODE (RROTATE_EXPR, "rrotate_expr", tcc_binary, 2)

+/* Vector/vector shifts, where both arguments are vector types.  This is only
+   used during the expansion of shifts and rotates.  */
+DEFTREECODE (VLSHIFT_EXPR, "vlshift_expr", tcc_binary, 2)
+DEFTREECODE (VRSHIFT_EXPR, "vrshift_expr", tcc_binary, 2)
+DEFTREECODE (VLROTATE_EXPR, "vlrotate_expr", tcc_binary, 2)
+DEFTREECODE (VRROTATE_EXPR, "vrrotate_expr", tcc_binary, 2)
+
 /* Bitwise operations.  Operands have same mode as result.  */
 DEFTREECODE (BIT_IOR_EXPR, "bit_ior_expr", tcc_binary, 2)
 DEFTREECODE (BIT_XOR_EXPR, "bit_xor_expr", tcc_binary, 2)

Here is the file that segfaults:
/data/fsf-build/bulldozer-gcc-test/./gcc/xgcc -shared-libgcc
-B/data/fsf-build/bulldozer-gcc-test/./gcc -nostdinc++
-L/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/src
-L/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
-B/proj/gcc/fsf-install/bulldozer-gcc-test/x86_64-unknown-linux-gnu/bin/
-B/proj/gcc/fsf-install/bulldozer-gcc-test/x86_64-unknown-linux-gnu/lib/
-isystem
/proj/gcc/fsf-install/bulldozer-gcc-test/x86_64-unknown-linux-gnu/include
-isystem
/proj/gcc/fsf-install/bulldozer-gcc-test/x86_64-unknown-linux-gnu/sys-include
-Winvalid-pch -x c++-header -g -O2   -D_GNU_SOURCE
-I/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu
-I/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include
-I/proj/gcc/fsf-src/bulldozer-gcc-test/libstdc++-v3/libsupc++ -O0 -g
/proj/gcc/fsf-src/bulldozer-gcc-test/libstdc++-v3/include/precompiled/stdc++.h
-o x86_64-unknown-linux-gnu/bits/stdc++.h.gch/O0g.gch
In file included from
/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include/valarray:539,
 from
/proj/gcc/fsf-src/bulldozer-gcc-test/libstdc++-v3/include/precompiled/stdc++.h:96:
/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include/valarray:
In instantiation of ‘std::valarray’:
/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/valarray_after.h:59:
  instantiated from here
/data/fsf-build/bulldozer-gcc-test/x86_64-unknown-linux-gnu/libstdc++-v3/include/valarray:117:
internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.


-- 
   Summary: Adding 4 more tree codes causes a crash in building
libstdc++ pre-compiled headers
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
    AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: michael dot meissner at amd dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004



[Bug c++/35004] Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers

2008-01-28 Thread michael dot meissner at amd dot com


--- Comment #1 from michael dot meissner at amd dot com  2008-01-29 00:04 
---
Created an attachment (id=15040)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15040&action=view)
Preprocessed file from the build of the libstdc++ pre-compiled headers

File is bzip2'ed -9.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004



[Bug c++/35004] Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers

2008-01-28 Thread michael dot meissner at amd dot com


--- Comment #4 from michael dot meissner at amd dot com  2008-01-29 00:39 
---
Created an attachment (id=15043)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15043&action=view)
Proposed patch to fix the problem

The problem is cp/cp-tree.h stores the tree_code in 8 bits, but the tree code
now overflows.  The patch expands the tree code to 16 bits, and removes 8
unused bits to keep the padding the same.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004



[Bug c++/35004] Adding 4 more tree codes causes a crash in building libstdc++ pre-compiled headers

2008-02-07 Thread michael dot meissner at amd dot com


--- Comment #6 from michael dot meissner at amd dot com  2008-02-07 17:22 
---
Subject: RE:  Adding 4 more tree codes causes a crash in
 building libstdc++ pre-compiled headers

The problem is there are two different vector shifts.  There is vector
shift by a scalar amount (each element gets shifted the same amount),
and vector shift by a vector (each element gets shifted by the
corresponding element in the vector).

Right now, GCC in tree-vect-transform.c looks at the shift optab and
sees if the second operand is a scalar mode, it believes the machine
only supports the vector shift by scalar mode, and assumes that if the
type is vector mode, that the machine supports vector/vector shifts.

The SSE2 instruction set extension on the x86 has vector/scalar shift
instructions, and the SSE5 instruction set extension adds vector/vector
shifts and rotates.  I want to be able to add support for a machine that
has both types of vector shift, but with the current framework, this was
impossible.

--
Michael Meissner
AMD, MS 83-29
90 Central Street
Boxborough, MA 01719

> -Original Message-
> From: bonzini at gnu dot org [mailto:[EMAIL PROTECTED]
> Sent: Thursday, February 07, 2008 12:11 PM
> To: Meissner, Michael
> Subject: [Bug c++/35004] Adding 4 more tree codes causes a crash in
> building libstdc++ pre-compiled headers
> 
> 
> 
> --- Comment #5 from bonzini at gnu dot org  2008-02-07 17:10
---
> Unrelated, but why couldn''t vector/vector shifts/rotates overload
> LSHIFT_EXPR
> instead? :-)
> 
> 
> --
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004
> 
> --- You are receiving this mail because: ---
> You reported the bug, or are watching the reporter.
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35004



[Bug target/35189] -mno-sse4.2 turns off SSE4a

2008-02-13 Thread michael dot meissner at amd dot com


--- Comment #2 from michael dot meissner at amd dot com  2008-02-13 23:55 
---
Umm, SSE4A is completely different from SSE4/SSE4.1/SSE4.2.  SSE4A are the
instructions added with AMD's Barcelona machine, while SSE4.1 is the
instructions added with the current generation of Intel machines (Penryn if
memory serves), and SSE4.2 will be the instructions in the next Intel release. 
The whole naming scheme is unfortunate, especially SSSE3 and SSE4A. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35189



[Bug target/35189] -mno-sse4.2 turns off SSE4a

2008-02-13 Thread michael dot meissner at amd dot com


--- Comment #4 from michael dot meissner at amd dot com  2008-02-14 00:20 
---
In terms of shipping systems, no AMD system supports SSSE3 right now.  As I
understand it, the SSSE3 instructions were inbetween SSE3 and SSE4.1 on Intel
systems, so -mno-sse3 should turn off SSSE3, but -mno-sse4a should not turn off
SSSE3.  Current shipping AMD systems do support SSE3.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35189



[Bug middle-end/17886] variable rotate and long long rotate should be better optimized

2005-10-04 Thread michael dot meissner at amd dot com


--- Comment #14 from michael dot meissner at amd dot com  2005-10-04 18:59 
---
Created an attachment (id=9876)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9876&action=view)
Patch for x86 double word shifts

This patch fixes the bug from the x86 side of things instead of from the
machine independent, by adding direct expanders for the best code (for doing 64
bit rotates in 32-bit mode and 128 bit rotates in 64-bit mode).  On a machine
with conditional move (all recent processors), the code becomes:

movl%edx, %ebx
shldl   %eax, %edx
shldl   %ebx, %eax
movl%edx, %ebx
andl$32, %ecx
cmovne  %eax, %edx
cmovne  %ebx, %eax

However, I suspect using MMX or SSE2 instructions will provide even more of a
speedup, since there are direct 64-bit shifts, and, or, load/store directly
(but no direct rotate).  In the MMX space you have to be careful not to have
active floating point going on, and to switch out of MMX mode before doing
calls or returns. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886



[Bug middle-end/17886] variable rotate and long long rotate should be better optimized

2005-10-04 Thread michael dot meissner at amd dot com


--- Comment #15 from michael dot meissner at amd dot com  2005-10-04 19:51 
---
Note, Mark's patch as applied to the tree has a minor typo in it.  The rotrdi3
define_expand uses (rotate:DI ...) instead of (rotatert:DI ...).  It doesn't
matter in practice, since the generator function is never called, but it is
useful to have the right insns listed.


-- 

michael dot meissner at amd dot com changed:

   What|Removed |Added

 CC|        |michael dot meissner at amd
   |            |dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886



[Bug middle-end/17886] variable rotate and long long rotate should be better optimized

2005-10-04 Thread michael dot meissner at amd dot com


--- Comment #16 from michael dot meissner at amd dot com  2005-10-04 20:06 
---
Created an attachment (id=9880)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9880&action=view)
Respin of 17886 patch to match new tree contents

This patch is meant to apply on top of Mark's changes, but provides the same
code as my previous patch.


-- 

michael dot meissner at amd dot com changed:

   What|Removed |Added

Attachment #9876 is|0   |1
   obsolete||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886



[Bug middle-end/17886] variable rotate and long long rotate should be better optimized

2005-10-04 Thread michael dot meissner at amd dot com


--- Comment #18 from michael dot meissner at amd dot com  2005-10-04 20:32 
---
Subject: RE:  variable rotate and long long rotate
 should be better optimized

Yep, all valid points.  So I don't think it should be done by default.
But I suspect the original poster's application may be well behaved to
be able to use it.  Certainly if the only reason for doing long long is
to do heavy duty bit banging (shift/rotate/and/or/test), but no
arithmetic it would speed up since it could do one instruction instead
of multiple, and it would lesson the register pressure that long longs
put on the x86.

-Original Message-
From: ak at muc dot de [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 04, 2005 4:20 PM
To: Meissner, Michael
Subject: [Bug middle-end/17886] variable rotate and long long rotate
should be better optimized



--- Comment #17 from ak at muc dot de  2005-10-04 20:20 ---
The code now looks fine to me thanks

I would prefer if it didn't generate SSE2/MMX code because that would be
a
problem for kernels. Also in many x86 implementations moving things
between
normal integer registers and SIMD registers is quite slow and would
likely eat
all advantages


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886



[Bug middle-end/17886] variable rotate and long long rotate should be better optimized

2005-10-04 Thread michael dot meissner at amd dot com


--- Comment #19 from michael dot meissner at amd dot com  2005-10-04 20:35 
---
Subject: RE:  variable rotate and long long rotate
 should be better optimized

I almost forgot, kernels should be using -mno-mmx and -mno-sse as a
matter of course (or -msoft-float).  I first ran into this problem in
1990 when I was supporting the MIPS platform, and the kernel guys were
surprised that the compiler would use the double precision registers to
do block copies, since it could double the bandwidth of doing 32-bit
moves.

-Original Message-
From: ak at muc dot de [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 04, 2005 4:20 PM
To: Meissner, Michael
Subject: [Bug middle-end/17886] variable rotate and long long rotate
should be better optimized



--- Comment #17 from ak at muc dot de  2005-10-04 20:20 ---
The code now looks fine to me thanks

I would prefer if it didn't generate SSE2/MMX code because that would be
a
problem for kernels. Also in many x86 implementations moving things
between
normal integer registers and SIMD registers is quite slow and would
likely eat
all advantages


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886



[Bug middle-end/17886] variable rotate and long long rotate should be better optimized

2005-10-04 Thread michael dot meissner at amd dot com


--- Comment #21 from michael dot meissner at amd dot com  2005-10-04 20:46 
---
Subject: RE:  variable rotate and long long rotate
 should be better optimized

Sorry, I got mixed up as to who the original poster was.

SSE2 is harder to use because it deals with 128 bit items instead of 64
bit (unless you are in 64-bit and working on TImode values).
Ultimately, it is a matter whether it is important enough for somebody
to spend a week or two of work to use the multimedia instructions for
this case.  I suspect in most cases, it might be better to isolate the
code and use #ifdef's and builtin functions/asm's.

-Original Message-
From: ak at muc dot de [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 04, 2005 4:40 PM
To: Meissner, Michael
Subject: [Bug middle-end/17886] variable rotate and long long rotate
should be better optimized



--- Comment #20 from ak at muc dot de  2005-10-04 20:39 ---
Newer linux does that of course, although not always in older releases.

But even in user space it's not a good idea to use SSE2 unless you
really need
it because it increases the cost of the context switch and costs an
exception
each time first in a timeslice.

P.S.: I was the original poster, but the application wasn't a kernel but
I
doubt
it's a good idea to use SSE2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886



[Bug rtl-optimization/23812] swapping DImode halves produces poor x86 register allocation

2005-10-18 Thread michael dot meissner at amd dot com


--- Comment #3 from michael dot meissner at amd dot com  2005-10-18 17:44 
---
Note, since this is a rotate, the patches I proposed in 17886 will generate
much better code for this one case (basically mov/mov/xchgl -- it could be
improved by a peephole to do the moves directly instead of xchgl).  However,
the more general subreg problem needs to be looked at.


-- 

michael dot meissner at amd dot com changed:

   What|Removed |Added

 CC||michael dot meissner at amd
   ||dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23812



[Bug target/34077] New: GCC -O1 -minline-all-stringops -minline-stringops-dynamically fails for spec 2006 bzip2, gobmk, and h264ref benchmarks

2007-11-12 Thread michael dot meissner at amd dot com
I was building SPEC 2006 with the options: -minline-all-stringops
-minline-stringops-dynamically in addition to my normal options.  If you use
both options together, GCC generates the following error:
foo.c: In function ‘spec_random_load’:
foo.c:24: internal compiler error: in int_mode_for_mode, at stor-layout.c:258
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

Tracing it down, emit_cmp_and_jump_insns is called to compare and jump with two
constant values:
Breakpoint 3, emit_cmp_and_jump_insns (x=0x2dff3c50, y=0x2dff3480,
comparison=LTU, size=0x0, mode=SImode, unsignedp=1, label=0x2e134fa0)
at /proj/gcc/fsf-src/trunk/gcc/optabs.c:4428
(gdb) print x
$7 = (rtx) 0x2dff3c50
(gdb) pr
(const_int 131072 [0x2])
(gdb) print y
$8 = (rtx) 0x2dff3480
(gdb) pr
(const_int 8 [0x8])
(gdb) up
#1  0x008adab6 in ix86_expand_movmem (dst=0x2e136a60,
src=0x2e136a80, count_exp=0x2dff3c50, align_exp=, 
expected_align_exp=, expected_size_exp=) at /proj/gcc/fsf-src/trunk/gcc/config/i386/i386.c:15362

The failure comes because integer constants have VOIDmode type, rather than an
integer type.

Either emit_cmp_and_jump_insns should handle the constant/constant case, or
ix86_expand_movemem should not call emit_cmp_and_jump_insns with constant
tests.


-- 
   Summary: GCC -O1 -minline-all-stringops -minline-stringops-
dynamically fails for spec 2006 bzip2, gobmk, and
h264ref benchmarks
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: michael dot meissner at amd dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34077



[Bug target/34077] GCC -O1 -minline-all-stringops -minline-stringops-dynamically fails for spec 2006 bzip2, gobmk, and h264ref benchmarks

2007-11-12 Thread michael dot meissner at amd dot com


--- Comment #1 from michael dot meissner at amd dot com  2007-11-12 20:38 
---
Created an attachment (id=14533)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14533&action=view)
Reduced testcase for bug 34077 from 401.bzip2

This is the reduced testcase from 401.bzip2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34077



[Bug target/34077] GCC -O1 -minline-all-stringops -minline-stringops-dynamically fails for spec 2006 bzip2, gobmk, and h264ref benchmarks

2007-11-13 Thread michael dot meissner at amd dot com


--- Comment #3 from michael dot meissner at amd dot com  2007-11-13 20:48 
---
Created an attachment (id=14548)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14548&action=view)
Patch to fix PR34077


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34077



[Bug target/25295] unused register saved in function prolog

2006-12-04 Thread michael dot meissner at amd dot com


--- Comment #3 from michael dot meissner at amd dot com  2006-12-04 23:21 
---
I've done some analysis on the test case.  The current GCC 4.2 and mainline
branches no longer generate the initial push of %r8, but instead do a subq
$8,%rsp.  I believe in the compiler you used it did the push to allocate 8
bytes of stack instead of the subtract.  Note, the epilogue still uses a pop to
remove the stack location.  The core of the problem is that the compiler is
allocating 8 bytes too much stack in this particular case.  I think I
understand whats going on, but I want to dig a bit more.


-- 

michael dot meissner at amd dot com changed:

   What|Removed |Added

 CC||michael dot meissner at amd
   |    |dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25295