[Bug rtl-optimization/28019] New: ICE: x86 scheduler upsets local reg alloc

2006-06-13 Thread stuart at apple dot com
Scheduler hoists an x86 IMUL (clobbers %eax) ahead of the first reference to a
parameter passed in %eax.

% /Volumes/sandbox/stuart/gcc.fsf.pure.debug.obj/gcc/cc1 reduce.c  -quiet
-mtune=generic -O2 -fschedule-insns
reduce.c: In function '_perfInitPerfTable':
reduce.c:43: error: unable to find a register to spill in class 'AREG'
reduce.c:43: error: this is the insn:
(insn:HI 21 83 9 2 (parallel [
(set (reg/v:SI 5 di [orig:66 maxNvClk ] [66])
(truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (mem/s:SI
(plus:SI (reg/v/f:SI 1 dx [orig:71 thisPerf ] [71])
(const_int 80 [0x50])) [7
.MaxNvclkAllowed+0 S4 A32]))
(zero_extend:DI (reg:SI 3 bx [78])))
(const_int 32 [0x20]
(clobber (scratch:SI))
(clobber (reg:CC 17 flags))
]) 189 {*umulsi3_highpart_insn} (nil)
(expr_list:REG_DEAD (reg/v/f:SI 1 dx [orig:71 thisPerf ] [71])
(expr_list:REG_DEAD (reg:SI 3 bx [78])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_UNUSED (scratch:SI)
(nil))
reduce.c:43: internal compiler error: in spill_failure, at reload1.c:1911
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.

Here is the testcase:

typedef unsigned long U32;
typedef U32 U032;
typedef struct RMTIMEOUT {
}
NODE, *PNODE;
typedef struct OBJP OBJP, *POBJP;
typedef struct OBJPERF *POBJPERF;
typedef struct _def_ext_y_gen_params {
  U032 G;
}
YFREQ, *PYFREQ;
typedef void PerfInitPerfTables (POBJP, POBJPERF);
typedef struct _perf_level {
  YFREQ Y;
}
PERF_LEVEL, *PPERF_LEVEL;
typedef struct _perf_table {
  PERF_LEVEL Levels[10];
}
PERF_TABLE, *PPERF_TABLE;
struct OBJPERF {
  PERF_TABLE lowPower;
  PERF_TABLE fullPower;
  U032 MaxyAllowed;
  PerfInitPerfTables *perfInitPerfTables;
};
static PerfInitPerfTables perfInitPerfTables;
void constructObjPerf(POBJPERF thisPerf, U032 thisPublicHalID) {
  thisPerf->perfInitPerfTables = perfInitPerfTables;
}
static U032 _perfInitPerfTable(POBJP pP, POBJPERF thisPerf, PPERF_TABLE
pPerfTable, U032 startEntry, U032 flagMask, U032 flagVal) {
  U032 i, matchingLevels = 0;
  U032 maxY, maxMY;
  maxY = thisPerf->MaxyAllowed / (10 * 1000);
  for (i = 0;
   i < 10;
   i++) {
if (DevinitGetPerfLevelEntry(pP, startEntry + i, &pPerfTable->Levels[i]) !=
0x) break;
if (maxY) {
  if (pPerfTable->Levels[i].Y.G > maxY)
pPerfTable->Levels[i].Y.G = maxY;
}
  }
}
static void perfInitPerfTables(POBJP pP, POBJPERF thisPerf) {
  U032 i, data, levelsFound;
  levelsFound = _perfInitPerfTable(pP, thisPerf, &thisPerf->lowPower, 0,
(1<<(0)), 0);
  levelsFound = _perfInitPerfTable(pP, thisPerf, &thisPerf->fullPower,
levelsFound, (1<<(0)), 1);
}


-- 
   Summary: ICE: x86 scheduler upsets local reg alloc
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
    AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: stuart at apple dot com
 GCC build triplet: i386-apple-darwin8.7.2
  GCC host triplet: i386-apple-darwin8.7.2
GCC target triplet: i386-apple-darwin8.7.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28019



[Bug rtl-optimization/28019] ICE: x86 scheduler upsets local reg alloc

2006-06-13 Thread stuart at apple dot com


--- Comment #1 from stuart at apple dot com  2006-06-13 19:44 ---
Created an attachment (id=11663)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11663&action=view)
Testcase

Attaching (same) testcase.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28019



[Bug target/28825] New: return (vector float) { a, a, b, b } generates unwanted MMX insns

2006-08-23 Thread stuart at apple dot com
X-Bugzilla-Reason: CC

+++ This bug was initially created as a clone of Bug #24073 +++

Take the following example:
#define vector __attribute__((vector_size(16)))

float a; float b;
vector float f(void) { return (vector float){ a, b, 0.0, 0.0}; }
---
Currently we get:
subl$12, %esp
movss   _b, %xmm0
movss   _a, %xmm1
unpcklps%xmm0, %xmm1
movaps  %xmm1, %xmm0
xorl%eax, %eax
xorl%edx, %edx
movl%eax, (%esp)
movl%edx, 4(%esp)
xorps   %xmm1, %xmm1
movlhps %xmm1, %xmm0
addl$12, %esp

--
We should be able to produce:
movss _b, %xmm0
movss _a, %xmm1
shufps 60, /*[0, 3, 3, 0]*/, %xmm1, %xmm0 // _a, 0, 0, _b
shufps 201, /*[3, 0, 2, 1]*/, %xmm0, %xmm0 // _a, _b, 0, 0

This is from Nathan Begeman.

 --- Comment #4 From Uros Bizjak  2005-09-27 11:41   ---

I think that following example wins the contest:

vector float f(void) { return (vector float){ a, a, b, b}; }

gcc -O2 -msse -fomit-frame-pointer

subl$28, %esp
movss   a, %xmm0
movss   %xmm0, 4(%esp)
movss   b, %xmm0
movd4(%esp), %mm0
punpckldq   %mm0, %mm0
movss   %xmm0, 4(%esp)
movq%mm0, 16(%esp)
movd4(%esp), %mm0
punpckldq   %mm0, %mm0
movq%mm0, 8(%esp)
movlps  16(%esp), %xmm1
movhps  8(%esp), %xmm1
addl$28, %esp
movaps  %xmm1, %xmm0
ret

Note the usage of MMX registers.


--- Comment #5 From Andrew Pinski 2005-09-27 14:33 ---

(In reply to comment #4)
> I think that following example wins the contest:
> 
> vector float f(void) { return (vector float){ a, a, b, b}; }

For this, it is a different bug.  The issue with the above is that
ix86_expand_vector_init_duplicate check 
for mmx_okay is bad.
Currently, we have
  if (!mmx_ok && !TARGET_SSE)
but I if I change it to:
  if (!mmx_ok)
we get:
movss   _a, %xmm0
movss   _b, %xmm1
unpcklps%xmm0, %xmm0
unpcklps%xmm1, %xmm1
movlhps %xmm1, %xmm0
Which looks ok to me.  That testcase should be opened into another bug as it is
obviously wrong.

=
Cloned from 24073 to track the MMX insn issue; the original 24073 problem is a
performance issue.


-- 
   Summary: return (vector float) { a, a, b, b } generates unwanted
MMX insns
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: stuart at apple dot com
GCC target triplet: i786-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28825



[Bug target/24073] (vector float){a, b, 0, 0} code gen is not good

2006-08-23 Thread stuart at apple dot com


--- Comment #6 from stuart at apple dot com  2006-08-23 21:24 ---
Cloned 28825 from this bug to track the MMX instruction issue.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24073



[Bug target/28826] New: return (vector float) { a, a, b, b } generates unwanted MMX insns

2006-08-23 Thread stuart at apple dot com
X-Bugzilla-Reason: CC

+++ This bug was initially created as a clone of Bug #24073 +++

Take the following example:
#define vector __attribute__((vector_size(16)))

float a; float b;
vector float f(void) { return (vector float){ a, b, 0.0, 0.0}; }
---
Currently we get:
subl$12, %esp
movss   _b, %xmm0
movss   _a, %xmm1
unpcklps%xmm0, %xmm1
movaps  %xmm1, %xmm0
xorl%eax, %eax
xorl%edx, %edx
movl%eax, (%esp)
movl%edx, 4(%esp)
xorps   %xmm1, %xmm1
movlhps %xmm1, %xmm0
addl$12, %esp

--
We should be able to produce:
movss _b, %xmm0
movss _a, %xmm1
shufps 60, /*[0, 3, 3, 0]*/, %xmm1, %xmm0 // _a, 0, 0, _b
shufps 201, /*[3, 0, 2, 1]*/, %xmm0, %xmm0 // _a, _b, 0, 0

This is from Nathan Begeman.

 --- Comment #4 From Uros Bizjak  2005-09-27 11:41   ---

I think that following example wins the contest:

vector float f(void) { return (vector float){ a, a, b, b}; }

gcc -O2 -msse -fomit-frame-pointer

subl$28, %esp
movss   a, %xmm0
movss   %xmm0, 4(%esp)
movss   b, %xmm0
movd4(%esp), %mm0
punpckldq   %mm0, %mm0
movss   %xmm0, 4(%esp)
movq%mm0, 16(%esp)
movd4(%esp), %mm0
punpckldq   %mm0, %mm0
movq%mm0, 8(%esp)
movlps  16(%esp), %xmm1
movhps  8(%esp), %xmm1
addl$28, %esp
movaps  %xmm1, %xmm0
ret

Note the usage of MMX registers.


--- Comment #5 From Andrew Pinski 2005-09-27 14:33 ---

(In reply to comment #4)
> I think that following example wins the contest:
> 
> vector float f(void) { return (vector float){ a, a, b, b}; }

For this, it is a different bug.  The issue with the above is that
ix86_expand_vector_init_duplicate check 
for mmx_okay is bad.
Currently, we have
  if (!mmx_ok && !TARGET_SSE)
but I if I change it to:
  if (!mmx_ok)
we get:
movss   _a, %xmm0
movss   _b, %xmm1
unpcklps%xmm0, %xmm0
unpcklps%xmm1, %xmm1
movlhps %xmm1, %xmm0
Which looks ok to me.  That testcase should be opened into another bug as it is
obviously wrong.

=
Cloned from 24073 to track the MMX insn issue; the original 24073 problem is a
performance issue.


-- 
   Summary: return (vector float) { a, a, b, b } generates unwanted
MMX insns
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: stuart at apple dot com
GCC target triplet: i786-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28826



[Bug target/28825] return (vector float) { a, a, b, b } generates unwanted MMX insns

2006-08-23 Thread stuart at apple dot com


--- Comment #4 from stuart at apple dot com  2006-08-23 21:44 ---
Per Ians email of 18aug2006, I've committed Andrew's fix as SVN revision
116356.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28825



[Bug target/24073] (vector float){a, b, 0, 0} code gen is not good

2006-08-23 Thread stuart at apple dot com


--- Comment #7 from stuart at apple dot com  2006-08-23 21:54 ---
Time has passed, and GCC has improved on this testcase.  Here is what we
generate today (trunk, 23aug2006) for the original testcase:

movss   b(%rip), %xmm0
movss   a(%rip), %xmm1
unpcklps%xmm0, %xmm1
movaps  %xmm1, %xmm0
xorps   %xmm1, %xmm1
movlhps %xmm1, %xmm0
ret

This isn't perfect, but it's much better than before.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24073



[Bug target/30757] [4.3 Regression] ICE with -march=athlon-xp -mfpmath=sse

2007-02-12 Thread stuart at apple dot com


--- Comment #2 from stuart at apple dot com  2007-02-12 17:11 ---
Almost certainly my fault; I'll look into this.  Suggested workaround: choose a
different target cpu; 'pentium4' works.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30757



[Bug target/30757] [4.3 Regression] ICE with -march=athlon-xp -mfpmath=sse

2007-02-12 Thread stuart at apple dot com


--- Comment #3 from stuart at apple dot com  2007-02-12 18:32 ---
O.K., the breakage here is that athlon-xp is an SSE1 machine, and most of the
conversions in the patch require SSE2.  It looks like SSE1 will support a few
of the new conversions (e.g. unsigned int32 <=> float); I'll see about enabling
these for SSE1 targets, and arranging for the SSE2-only conversions to fall
back to the x87.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30757



[Bug target/30757] [4.3 Regression] ICE with -march=athlon-xp -mfpmath=sse

2007-02-13 Thread stuart at apple dot com


--- Comment #4 from stuart at apple dot com  2007-02-13 21:00 ---
Committed a fix:

http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01171.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30757



[Bug objc/31281] New: ICE on ObjC try-catch blocks

2007-03-20 Thread stuart at apple dot com
The 4.2 ObjC compiler ICEs on this (nonsensically reduced) testcase.  Compile
with -O2:

int f(unsigned int i)
{
  @try { } @catch(id) { }
  for (;;)
for (;;)
  @try {
if (i)
  break;
  } @catch(id) { }
}

The 4.0 compiler does not ICE with this testcase.


-- 
   Summary: ICE on ObjC try-catch blocks
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: objc
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: stuart at apple dot com
GCC target triplet: powerpc-apple-darwin


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31281



[Bug objc/31283] New: ICE on ObjC try-catch blocks

2007-03-20 Thread stuart at apple dot com
The 4.2 ObjC compiler ICEs on this (nonsensically reduced) testcase.  Compile
with -O2:

int f(unsigned int i)
{
  @try { } @catch(id) { }
  for (;;)
for (;;)
  @try {
if (i)
  break;
  } @catch(id) { }
}

The 4.0 compiler does not ICE with this testcase.


-- 
   Summary: ICE on ObjC try-catch blocks
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: objc
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: stuart at apple dot com
GCC target triplet: powerpc-apple-darwin


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31283



[Bug objc/31281] ICE on ObjC try-catch blocks with next runtime

2007-03-27 Thread stuart at apple dot com


--- Comment #3 from stuart at apple dot com  2007-03-27 18:18 ---
Patch offered here:

  http://gcc.gnu.org/ml/gcc-patches/2007-03/msg01328.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31281



[Bug c++/32466] New: illegal loop store motion of bitfield

2007-06-22 Thread stuart at apple dot com
#include 
void __attribute__ ((__noinline__))
add14(unsigned short int nextID)
{
  struct {
unsigned short int id: 14;
  } hdr;
  hdr.id = nextID;
  do {
hdr.id++;
if (printf ("should print 0x: 0x%04X\n", (unsigned int)hdr.id))
  break;
  } while (1);
}
main()
{
  add14 (0x3FFF);
  return 0;
}

G++ miscompiles with optimization.  GCC compiles correctly.  Regression from
3.3.


-- 
   Summary: illegal loop store motion of bitfield
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: stuart at apple dot com
 GCC build triplet: i386-apple-darwin9
  GCC host triplet: i386-apple-darwin9
GCC target triplet: i386-apple-darwin9


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32466



[Bug target/21195] SSE intrinsics not inlined, sometimes.

2005-06-29 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2005-06-29 16:49 
---
I marked all the x86 vector intrinsics with always_inline, and this seems to 
fix both the testcases here.

http://gcc.gnu.org/ml/gcc-cvs/2005-06/msg01059.html

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195


[Bug c/19039] New: another SSE optimization ICE

2004-12-16 Thread stuart at apple dot com
Attached testcase ICEs GCC:

[opel:/Volumes/sandbox/stuart] hasting2% gcc.fsf.debug.obj/gcc/xgcc -B 
gcc.fsf.debug.obj/gcc -O1 -
msse2 -S m4.i
m4.i: In function 'rrr':
m4.i:36: error: unrecognizable insn:
(insn 283 210 211 10 (set (reg:V16QI 22 xmm1)
(const_int 1 [0x1])) -1 (nil)
(nil))
m4.i:36: internal compiler error: in extract_insn, at recog.c:2020
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.
---
Here is the testcase:
---
typedef unsigned long ulong;
typedef long long __v2di __attribute__ ((vector_size (16)));
typedef char __v16qi __attribute__ ((vector_size (16)));
typedef struct
{
void *v1;
ulong kk;
ulong ss;
}bbbccc;
long rrr(const bbbccc *od, ulong lc, ulong rc, ulong br, ulong kw)
{
long i, j, x, y, kx = kw, ky, p1 = (kx)/2, r1 = (ky)/2, kk = od->kk, q1 = 
lc - p1, sts = p1, lml = od-
>ss - rc,  g1, *pg1, i1, s1, j1;
char *out = (char*) od->v1;
__v2di h1;
for( i = 0; i < kk; i++ )
{
j1 = i + r1;
if( i >= kk - (long) br )
j1 = ( kk-1) + r1 - (long) br;
s1 = j1 - i1;
for(; y < s1; y++ )
{
for( x = q1; x < sts; x++ )
;
}
out[j] = g1;
for( ; j <= lml - 16; j += 16 )
{
h1 = (__v2di)__builtin_ia32_pcmpeqb128 ((__v16qi)h1, (__v16qi)h1);
for( y = 0; y < s1; y++ )
for( x = 0; x < (long) kw; x++ )
  ;
__builtin_ia32_storedqu ((char *)pg1, (__v16qi)h1);
 }
}
}

-- 
   Summary: another SSE optimization ICE
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: stuart at apple dot com
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-apple-darwin
  GCC host triplet: i686-apple-darwin
GCC target triplet: i686-apple-darwin


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19039


[Bug debug/19521] New: omitted stab for gcov initialization function

2005-01-18 Thread stuart at apple dot com
gcov support entails an initialization function named "__GLOBAL__I_0_noop".
GCC omits function-begin stab for this function.

Here is the commandline:


[morris:/Volumes/sandbox/stuart] hasting2%
\/Volumes/sandbox/stuart/gcc.fsf.obj/gcc/xgcc -B
\/Volumes/sandbox/stuart/gcc.fsf.obj/gcc -g gcov.c -fprofile-arcs \
-ftest-coverage  -S

Given the .s file from the above, here is a check of the output:

[morris:/Volumes/sandbox/stuart] hasting2% egrep 'noop|main' gcov.s
.globl _noop
_noop:
.stabs  "noop:F(0,1)=(0,1)",36,0,7,_noop
.stabs  "",36,0,0,Lscope0-_noop
.globl _main
_main:
bl _noop
.stabs  "main:F(0,2)=r(0,2);-2147483648;2147483647;",36,0,11,_main
.stabn  192,0,0,_main
.stabs  "",36,0,0,Lscope1-_main
__GLOBAL__I_0_noop:
.stabs  "",36,0,0,Lscope2-__GLOBAL__I_0_noop
.long   __GLOBAL__I_0_noop
.long   __GLOBAL__I_0_noop
[morris:/Volumes/sandbox/stuart] hasting2%

The 'stabs "",36' record seems to signify the end-of-the __GLOBAL__I_0_noop
function.  The matching start-function record is missing; compare with the noop
and main functions.

The testcase is from the GCC testsuite: gcc/testsuite/gcc.misc-tests/gcov-1.c 
gcov.c

Since it's short, here is the testcase:

/* Test Gcov basics.  */

/* { dg-options "-fprofile-arcs -ftest-coverage" } */
/* { dg-do run { target native } } */

void noop ()
{
}

int main ()
{
  int i;

  for (i = 0; i < 10; i++)  /* count(11) */
noop ();/* count(10) */

  return 0; /* count(1) */
}

/* { dg-final { run-gcov gcov-1.c } } */

-- 
   Summary: omitted stab for gcov initialization function
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
     Component: debug
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: stuart at apple dot com
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: powerpc-apple-darwin
  GCC host triplet: powerpc-apple-darwin
GCC target triplet: powerpc-apple-darwin


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19521


[Bug debug/19521] omitted stab for gcov initialization function

2005-01-18 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2005-01-19 00:40 
---
Created an attachment (id=7986)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7986&action=view)
gcov-1.c testcase

Attaching the testcase for convenience.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19521


[Bug debug/19521] omitted stab for gcov initialization function

2005-01-18 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2005-01-19 00:49 
---
This is a regression from 3.3; I think the cause is this line in cgraphunit.c
(cgraph_build_static_cdtor): (approximately line 1847)

  DECL_IGNORED_P (decl) = 1;

Deleting this line "fixes" the symptom, but I believe the right fix lies in
dbxout.c.

The start-function stab comes from dbxout.c(dbxout_symbol), near line 2346:

  /* Ignore nameless syms, but don't ignore type tags.  */

  if ((DECL_NAME (decl) == 0 && TREE_CODE (decl) != TYPE_DECL)
  || DECL_IGNORED_P (decl))
DBXOUT_DECR_NESTING_AND_RETURN (0);

This check causes the omission of the start-function stab.  The corresponding
end-function stab comes from dbxout.c (dbxout_function_end), near line 465:

#ifdef DBX_OUTPUT_NFUN
  DBX_OUTPUT_NFUN (asm_out_file, lscope_label_name, current_function_decl);
#else
=>fprintf (asm_out_file, "%s\"\",%d,0,0,", ASM_STABS_OP, N_FUN);
  assemble_name (asm_out_file, lscope_label_name);
  putc ('-', asm_out_file);
  assemble_name (asm_out_file, XSTR (XEXP (DECL_RTL (current_function_decl), 0),
0));
  fprintf (asm_out_file, "\n");
#endif



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19521


[Bug debug/19521] [4.0 Regression] omitted stab for gcov initialization function

2005-01-19 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2005-01-19 17:08 
---
> So the bug is the end stab without the start stab?

Yes.

> Or do you think that this
> bit of code that corresponds not at all to any user code should have full 
> stabs?

My personal preference is a mild "yes."  But I can forsee that others will
disagree, and I recognize the validity of that position.

> If the later, why?

When I'm grubbing through a broken binary, it's helpful when the debugger tells
me that this function body didn't come from the user's sourcecode.  In general,
"more information is better."

I suppose the counterargument would be that most users don't look at the
assembly code, don't want to know about these functions, and would prefer
smaller debug information for faster linking and development.

I assume that most GCC users are unlike me, this I infer their argument wins.  I
can live with that; this is not a big deal either way.

If the debugger already knows the name of this function, and the stabs are not
adding any useful information, then I agree they're a waste and should be
omitted.  The big deal is that the begin/end stabs should match, both emitted or
both omitted.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19521


[Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch

2004-10-15 Thread stuart at apple dot com
When compiling the testcase (strcpy()) with -Os -march=pentium4, GCC generates a 
word-fetch 
instead of a byte-fetch.  This can provoke memory faults (e.g. at the end of a page).

Omitting "-march=pentium4" generates correct code.

Here is the testcase:

char *
mystrcpy(char * __restrict to, const char * __restrict from)
{
  char *save = to;
  
  for (; (*to = *from); ++from, ++to);
  return(save);
}

And the invocation:  % gcc.fsf.pure.obj/gcc/xgcc -B gcc.fsf.pure.obj/gcc -Os -S 
mystrcpy.c 
-march=pentium4

And the result (error is marked):

   .file   "mystrcpy.c"
.text
.globl mystrcpy
.type   mystrcpy, @function
mystrcpy:
pushl   %ebp
movl%esp, %ebp
movl12(%ebp), %ecx
movl8(%ebp), %edx
jmp .L2
.L3:
incl%ecx
incl%edx
.L2:
movl(%ecx), %eax should be 'movb'
movb%al, (%edx)
testb   %al, %al
jne .L3
movl8(%ebp), %eax
popl%ebp
ret
.size   mystrcpy, .-mystrcpy
.ident  "GCC: (GNU) 4.0.0 20041015 (experimental)"
.section.note.GNU-stack,"",@progbits
--
Also reproducible on i686-apple-darwin.

-- 
   Summary: -march=pentium4 generates word fetch instead of byte
fetch
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: stuart at apple dot com
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux
  GCC host triplet: i686-pc-linux
GCC target triplet: i686-pc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug c/18019] -march=pentium4 generates word fetch instead of byte fetch

2004-10-15 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2004-10-15 18:05 ---
Created an attachment (id=7359)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7359&action=view)
testcase

Attaching the testcase for covenience.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] -march=pentium4 generates word fetch instead of byte fetch

2004-10-15 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2004-10-15 18:27 ---
The bug was discovered when it walked off the end of a VM page and faulted.  Are you 
certain this is 
"expected behavior?"

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug c/17853] New: -O2 ICE for MMX testcase

2004-10-05 Thread stuart at apple dot com
gcc -O2 -mmmx i386-mmx-5.c
GCC fails thus:
-
i386-mmx-5.c: In function 'main':
i386-mmx-5.c:15: internal compiler error: in simplify_binary_operation, at 
simplify-rtx.c:2151
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.
-
/* { dg-do run { target i?86-*-* x86_64-*-* } } */
/* { dg-options "-O2 -mmmx" } */
#include 
#include 

__m64 global_mask;

main()
{
__m64 zero = _mm_setzero_si64();
__m64 mask = _mm_cmpeq_pi8( zero, zero );
mask = _mm_unpacklo_pi8( mask, zero );
global_mask = mask;
exit(0);
}
-

-- 
   Summary: -O2 ICE for MMX testcase
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: stuart at apple dot com
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-linux-gnu
  GCC host triplet: i686-linux-gnu
GCC target triplet: i686-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17853


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-11-09 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2004-11-09 19:34 
---
I agree with Roger.

I'm suspicious of the "0,1,2 ... TARGET_PARTIAL_xx" clauses of the "*movqi_1" 
pattern.  (Also the 
analogous parts of "*movhi_1".)

I've tried reverting Roger's patch, and excising the TARGET_PARTIAL_xx clauses; 
either change appears 
to fix the problem.  I've also successfully regression-tested the excision.

I will invite Jan Hubicka, author of the TARGET_PARTIAL_xx clauses (i386.md, 
v1.503), to look at this.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-11-16 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2004-11-16 19:39 
---
Here is the body of an email I sent to Jan Hubicka concerning this bug.  In the 
body of the message, 
'you' refers to Jan.
--
For discussion, here is the pattern in question as it exists on the FSF 
mainline today:

  1503  ;; Situation is quite tricky about when to choose full sized (SImode) 
move
  1504  ;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
  1505  ;; partial register dependency machines (such as AMD Athlon), where 
QImode
  1506  ;; moves issue extra dependency and for partial register stalls machines
  1507  ;; that don't use QImode patterns (and QImode move cause stall on the 
next
  1508  ;; instruction).
  1509  ;;
  1510  ;; For loads of Q_REG to NONQ_REG we use full sized moves except for 
partial
  1511  ;; register stall machines with, where we use QImode instructions, since
  1512  ;; partial register stall can be caused there.  Then we use movzx.
  1513  (define_insn "*movqi_1"
  1514[(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
  1515  (match_operand:QI 1 "general_operand"  " 
q,qn,qm,q,rn,qm,qn"))]
  1516"GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM"
  1517  {
  1518switch (get_attr_type (insn))
  1519  {
  1520  case TYPE_IMOVX:
  1521if (!ANY_QI_REG_P (operands[1]) && GET_CODE (operands[1]) != MEM)
  1522  abort ();
  1523return "movz{bl|x}\t{%1, %k0|%k0, %1}";
  1524  default:
  1525if (get_attr_mode (insn) == MODE_SI)
  1526  return "mov{l}\t{%k1, %k0|%k0, %k1}";
  1527else
  1528  return "mov{b}\t{%1, %0|%0, %1}";
  1529  }
  1530  }
  1531[(set (attr "type")
  1532   (cond [(ne (symbol_ref "optimize_size") (const_int 0))
  1533(const_string "imov")
  1534  (and (eq_attr "alternative" "3")
  1535   (ior (eq (symbol_ref "TARGET_PARTIAL_REG_STALL")
  1536(const_int 0))
  1537(eq (symbol_ref "TARGET_QIMODE_MATH")
  1538(const_int 0
  1539(const_string "imov")
  1540  (eq_attr "alternative" "3,5")
  1541(const_string "imovx")
  1542  (and (ne (symbol_ref "TARGET_MOVX")
  1543   (const_int 0))
  1544   (eq_attr "alternative" "2"))
  1545(const_string "imovx")
  1546 ]
  1547 (const_string "imov")))
  1548 (set (attr "mode")
  1549(cond [(eq_attr "alternative" "3,4,5")
  1550 (const_string "SI")
  1551   (eq_attr "alternative" "6")
  1552 (const_string "QI")
  1553   (eq_attr "type" "imovx")
  1554 (const_string "SI")
  1555   (and (eq_attr "type" "imov")
  1556(and (eq_attr "alternative" "0,1,2")
  1557 (ne (symbol_ref "TARGET_PARTIAL_REG_DEPENDENCY")
  1558 (const_int 0
  1559 (const_string "SI")
  1560   ;; Avoid partial register stalls when not using QImode 
arithmetic
  1561   (and (eq_attr "type" "imov")
  1562(and (eq_attr "alternative" "0,1,2")
  1563 (and (ne (symbol_ref "TARGET_PARTIAL_REG_STALL")
  1564  (const_int 0))
  1565  (eq (symbol_ref "TARGET_QIMODE_MATH")
  1566  (const_int 0)
  1567 (const_string "SI")
  1568 ]
  1569 (const_string "QI")))])

Roger added lines 1532-1533 in January of this year.  It looks like you added 
lines 1555-1567 in 2000.

The combination of lines 1532-1533 (use "imov" if -Os) and lines 1555-1559 (use 
SImode if "imov" and 
byte-load and K8/P4/Nocona) means we generate a "movl" that should be a "movb". 
 (The testcase is 
strcpy(); see the Bugzilla.)  For the following discussion, note that GCC 
currently matches "movqi_1" 
alternative #2 ("q" and "qm" in the attribute list) on the critical 
byte-fetch-from-memory in the strcpy() 
testcase.

It appears to me that the 1555-1559 clause depends upon any CPU with 

[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-01 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2004-12-02 01:07 
---
Jan emailed this to me privately.  Appended here for completeness.  - stuart

Just to clarify things a bit.  TARGET_MOVX and
TARGET_PARTIAL_REG_DEPENDENCY is not about supporting some feature but
about a way the CPU deals with dependencies on partial registers.  Some
CPUs (Athlon+,P4+) deal with partial register writes as read-modify
operation of the whole thing (TARGET_PARTIAL_REG_DEPENDENCT) and for
some of these it is profitable to do dummy zero extend (TARGET_MOVX)
instead of loads to avoid the dependency, while others (K6, P3) give it
another internal name and don't see the false dependency
(TARGET_PARTIAL_REG_STALL). On the other hand they get penalty if the
result is used as a whole register.  There is unlikely to be ever CPU
spoiled up in both directions..

However the Roger's patch, as I understand it, is about avoiding movx as
it encodes longer on -Os.  It seems to me that for targets not defining
TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we should always
produce the straighforward movq as expected, while for
TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use
the full moves as long as they don't encode longer.  I can't check right
now, but i believe it is only the movl imm, register that comes out
longer and that is the alternative 2.

We can also probably kill the TARGET_QIMODE_MATH as it is no longer
used.

There is the type and mode argument not only to choose the particular
instruction but also to drive scheduling (K6 for instance has limited
supply of units that do 8bit operations).  imovx is 32bit operation so
it needs to get SImode.  I would simply break out the alternative 2 from
both conditionals and would additionally check optimize_size to be
nonzero

I can prepare patch later next week unless someone beats me :)

Honza

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug tree-optimization/24659] Conversions are not vectorized

2007-01-05 Thread stuart at apple dot com


--- Comment #3 from stuart at apple dot com  2007-01-05 18:27 ---
Created an attachment (id=12862)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12862&action=view)
vectorized assembly output from ICC v9.1

Generated from the "indefinite loop" variant of the testcase on OS X 10.4.7,
using ICC v9.1:
% icc -O2 -S 24659.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24659



[Bug tree-optimization/24659] Conversions are not vectorized

2007-01-05 Thread stuart at apple dot com


--- Comment #4 from stuart at apple dot com  2007-01-05 18:30 ---
I ran the testcase through ICC, and it unrolled the loops without vectorizing
them.  However, making the loops indefinite gets us the desired, vectorized
result.  Here is the modified, indefinite loop version of the testcase:

void test_fp (float *a, double *b, int count)
{
  int i;

  for (i = 0; i < count; i++)
b[i] = (double) a[i];
}

void test_int (int *a, double *b, int count)
{
  int i;

  for (i = 0; i < count; i++)
b[i] = (double) a[i];
}

(Note to Apple: this is Radar 4079267)


-- 

stuart at apple dot com changed:

   What|Removed |Added

 CC|    |stuart at apple dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24659