[Bug rtl-optimization/28019] New: ICE: x86 scheduler upsets local reg alloc
Scheduler hoists an x86 IMUL (clobbers %eax) ahead of the first reference to a parameter passed in %eax. % /Volumes/sandbox/stuart/gcc.fsf.pure.debug.obj/gcc/cc1 reduce.c -quiet -mtune=generic -O2 -fschedule-insns reduce.c: In function '_perfInitPerfTable': reduce.c:43: error: unable to find a register to spill in class 'AREG' reduce.c:43: error: this is the insn: (insn:HI 21 83 9 2 (parallel [ (set (reg/v:SI 5 di [orig:66 maxNvClk ] [66]) (truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (mem/s:SI (plus:SI (reg/v/f:SI 1 dx [orig:71 thisPerf ] [71]) (const_int 80 [0x50])) [7 .MaxNvclkAllowed+0 S4 A32])) (zero_extend:DI (reg:SI 3 bx [78]))) (const_int 32 [0x20] (clobber (scratch:SI)) (clobber (reg:CC 17 flags)) ]) 189 {*umulsi3_highpart_insn} (nil) (expr_list:REG_DEAD (reg/v/f:SI 1 dx [orig:71 thisPerf ] [71]) (expr_list:REG_DEAD (reg:SI 3 bx [78]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_UNUSED (scratch:SI) (nil)) reduce.c:43: internal compiler error: in spill_failure, at reload1.c:1911 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. Here is the testcase: typedef unsigned long U32; typedef U32 U032; typedef struct RMTIMEOUT { } NODE, *PNODE; typedef struct OBJP OBJP, *POBJP; typedef struct OBJPERF *POBJPERF; typedef struct _def_ext_y_gen_params { U032 G; } YFREQ, *PYFREQ; typedef void PerfInitPerfTables (POBJP, POBJPERF); typedef struct _perf_level { YFREQ Y; } PERF_LEVEL, *PPERF_LEVEL; typedef struct _perf_table { PERF_LEVEL Levels[10]; } PERF_TABLE, *PPERF_TABLE; struct OBJPERF { PERF_TABLE lowPower; PERF_TABLE fullPower; U032 MaxyAllowed; PerfInitPerfTables *perfInitPerfTables; }; static PerfInitPerfTables perfInitPerfTables; void constructObjPerf(POBJPERF thisPerf, U032 thisPublicHalID) { thisPerf->perfInitPerfTables = perfInitPerfTables; } static U032 _perfInitPerfTable(POBJP pP, POBJPERF thisPerf, PPERF_TABLE pPerfTable, U032 startEntry, U032 flagMask, U032 flagVal) { U032 i, matchingLevels = 0; U032 maxY, maxMY; maxY = thisPerf->MaxyAllowed / (10 * 1000); for (i = 0; i < 10; i++) { if (DevinitGetPerfLevelEntry(pP, startEntry + i, &pPerfTable->Levels[i]) != 0x) break; if (maxY) { if (pPerfTable->Levels[i].Y.G > maxY) pPerfTable->Levels[i].Y.G = maxY; } } } static void perfInitPerfTables(POBJP pP, POBJPERF thisPerf) { U032 i, data, levelsFound; levelsFound = _perfInitPerfTable(pP, thisPerf, &thisPerf->lowPower, 0, (1<<(0)), 0); levelsFound = _perfInitPerfTable(pP, thisPerf, &thisPerf->fullPower, levelsFound, (1<<(0)), 1); } -- Summary: ICE: x86 scheduler upsets local reg alloc Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stuart at apple dot com GCC build triplet: i386-apple-darwin8.7.2 GCC host triplet: i386-apple-darwin8.7.2 GCC target triplet: i386-apple-darwin8.7.2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28019
[Bug rtl-optimization/28019] ICE: x86 scheduler upsets local reg alloc
--- Comment #1 from stuart at apple dot com 2006-06-13 19:44 --- Created an attachment (id=11663) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11663&action=view) Testcase Attaching (same) testcase. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28019
[Bug target/28825] New: return (vector float) { a, a, b, b } generates unwanted MMX insns
X-Bugzilla-Reason: CC +++ This bug was initially created as a clone of Bug #24073 +++ Take the following example: #define vector __attribute__((vector_size(16))) float a; float b; vector float f(void) { return (vector float){ a, b, 0.0, 0.0}; } --- Currently we get: subl$12, %esp movss _b, %xmm0 movss _a, %xmm1 unpcklps%xmm0, %xmm1 movaps %xmm1, %xmm0 xorl%eax, %eax xorl%edx, %edx movl%eax, (%esp) movl%edx, 4(%esp) xorps %xmm1, %xmm1 movlhps %xmm1, %xmm0 addl$12, %esp -- We should be able to produce: movss _b, %xmm0 movss _a, %xmm1 shufps 60, /*[0, 3, 3, 0]*/, %xmm1, %xmm0 // _a, 0, 0, _b shufps 201, /*[3, 0, 2, 1]*/, %xmm0, %xmm0 // _a, _b, 0, 0 This is from Nathan Begeman. --- Comment #4 From Uros Bizjak 2005-09-27 11:41 --- I think that following example wins the contest: vector float f(void) { return (vector float){ a, a, b, b}; } gcc -O2 -msse -fomit-frame-pointer subl$28, %esp movss a, %xmm0 movss %xmm0, 4(%esp) movss b, %xmm0 movd4(%esp), %mm0 punpckldq %mm0, %mm0 movss %xmm0, 4(%esp) movq%mm0, 16(%esp) movd4(%esp), %mm0 punpckldq %mm0, %mm0 movq%mm0, 8(%esp) movlps 16(%esp), %xmm1 movhps 8(%esp), %xmm1 addl$28, %esp movaps %xmm1, %xmm0 ret Note the usage of MMX registers. --- Comment #5 From Andrew Pinski 2005-09-27 14:33 --- (In reply to comment #4) > I think that following example wins the contest: > > vector float f(void) { return (vector float){ a, a, b, b}; } For this, it is a different bug. The issue with the above is that ix86_expand_vector_init_duplicate check for mmx_okay is bad. Currently, we have if (!mmx_ok && !TARGET_SSE) but I if I change it to: if (!mmx_ok) we get: movss _a, %xmm0 movss _b, %xmm1 unpcklps%xmm0, %xmm0 unpcklps%xmm1, %xmm1 movlhps %xmm1, %xmm0 Which looks ok to me. That testcase should be opened into another bug as it is obviously wrong. = Cloned from 24073 to track the MMX insn issue; the original 24073 problem is a performance issue. -- Summary: return (vector float) { a, a, b, b } generates unwanted MMX insns Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stuart at apple dot com GCC target triplet: i786-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28825
[Bug target/24073] (vector float){a, b, 0, 0} code gen is not good
--- Comment #6 from stuart at apple dot com 2006-08-23 21:24 --- Cloned 28825 from this bug to track the MMX instruction issue. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24073
[Bug target/28826] New: return (vector float) { a, a, b, b } generates unwanted MMX insns
X-Bugzilla-Reason: CC +++ This bug was initially created as a clone of Bug #24073 +++ Take the following example: #define vector __attribute__((vector_size(16))) float a; float b; vector float f(void) { return (vector float){ a, b, 0.0, 0.0}; } --- Currently we get: subl$12, %esp movss _b, %xmm0 movss _a, %xmm1 unpcklps%xmm0, %xmm1 movaps %xmm1, %xmm0 xorl%eax, %eax xorl%edx, %edx movl%eax, (%esp) movl%edx, 4(%esp) xorps %xmm1, %xmm1 movlhps %xmm1, %xmm0 addl$12, %esp -- We should be able to produce: movss _b, %xmm0 movss _a, %xmm1 shufps 60, /*[0, 3, 3, 0]*/, %xmm1, %xmm0 // _a, 0, 0, _b shufps 201, /*[3, 0, 2, 1]*/, %xmm0, %xmm0 // _a, _b, 0, 0 This is from Nathan Begeman. --- Comment #4 From Uros Bizjak 2005-09-27 11:41 --- I think that following example wins the contest: vector float f(void) { return (vector float){ a, a, b, b}; } gcc -O2 -msse -fomit-frame-pointer subl$28, %esp movss a, %xmm0 movss %xmm0, 4(%esp) movss b, %xmm0 movd4(%esp), %mm0 punpckldq %mm0, %mm0 movss %xmm0, 4(%esp) movq%mm0, 16(%esp) movd4(%esp), %mm0 punpckldq %mm0, %mm0 movq%mm0, 8(%esp) movlps 16(%esp), %xmm1 movhps 8(%esp), %xmm1 addl$28, %esp movaps %xmm1, %xmm0 ret Note the usage of MMX registers. --- Comment #5 From Andrew Pinski 2005-09-27 14:33 --- (In reply to comment #4) > I think that following example wins the contest: > > vector float f(void) { return (vector float){ a, a, b, b}; } For this, it is a different bug. The issue with the above is that ix86_expand_vector_init_duplicate check for mmx_okay is bad. Currently, we have if (!mmx_ok && !TARGET_SSE) but I if I change it to: if (!mmx_ok) we get: movss _a, %xmm0 movss _b, %xmm1 unpcklps%xmm0, %xmm0 unpcklps%xmm1, %xmm1 movlhps %xmm1, %xmm0 Which looks ok to me. That testcase should be opened into another bug as it is obviously wrong. = Cloned from 24073 to track the MMX insn issue; the original 24073 problem is a performance issue. -- Summary: return (vector float) { a, a, b, b } generates unwanted MMX insns Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stuart at apple dot com GCC target triplet: i786-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28826
[Bug target/28825] return (vector float) { a, a, b, b } generates unwanted MMX insns
--- Comment #4 from stuart at apple dot com 2006-08-23 21:44 --- Per Ians email of 18aug2006, I've committed Andrew's fix as SVN revision 116356. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28825
[Bug target/24073] (vector float){a, b, 0, 0} code gen is not good
--- Comment #7 from stuart at apple dot com 2006-08-23 21:54 --- Time has passed, and GCC has improved on this testcase. Here is what we generate today (trunk, 23aug2006) for the original testcase: movss b(%rip), %xmm0 movss a(%rip), %xmm1 unpcklps%xmm0, %xmm1 movaps %xmm1, %xmm0 xorps %xmm1, %xmm1 movlhps %xmm1, %xmm0 ret This isn't perfect, but it's much better than before. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24073
[Bug target/30757] [4.3 Regression] ICE with -march=athlon-xp -mfpmath=sse
--- Comment #2 from stuart at apple dot com 2007-02-12 17:11 --- Almost certainly my fault; I'll look into this. Suggested workaround: choose a different target cpu; 'pentium4' works. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30757
[Bug target/30757] [4.3 Regression] ICE with -march=athlon-xp -mfpmath=sse
--- Comment #3 from stuart at apple dot com 2007-02-12 18:32 --- O.K., the breakage here is that athlon-xp is an SSE1 machine, and most of the conversions in the patch require SSE2. It looks like SSE1 will support a few of the new conversions (e.g. unsigned int32 <=> float); I'll see about enabling these for SSE1 targets, and arranging for the SSE2-only conversions to fall back to the x87. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30757
[Bug target/30757] [4.3 Regression] ICE with -march=athlon-xp -mfpmath=sse
--- Comment #4 from stuart at apple dot com 2007-02-13 21:00 --- Committed a fix: http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01171.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30757
[Bug objc/31281] New: ICE on ObjC try-catch blocks
The 4.2 ObjC compiler ICEs on this (nonsensically reduced) testcase. Compile with -O2: int f(unsigned int i) { @try { } @catch(id) { } for (;;) for (;;) @try { if (i) break; } @catch(id) { } } The 4.0 compiler does not ICE with this testcase. -- Summary: ICE on ObjC try-catch blocks Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: objc AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stuart at apple dot com GCC target triplet: powerpc-apple-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31281
[Bug objc/31283] New: ICE on ObjC try-catch blocks
The 4.2 ObjC compiler ICEs on this (nonsensically reduced) testcase. Compile with -O2: int f(unsigned int i) { @try { } @catch(id) { } for (;;) for (;;) @try { if (i) break; } @catch(id) { } } The 4.0 compiler does not ICE with this testcase. -- Summary: ICE on ObjC try-catch blocks Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: objc AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stuart at apple dot com GCC target triplet: powerpc-apple-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31283
[Bug objc/31281] ICE on ObjC try-catch blocks with next runtime
--- Comment #3 from stuart at apple dot com 2007-03-27 18:18 --- Patch offered here: http://gcc.gnu.org/ml/gcc-patches/2007-03/msg01328.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31281
[Bug c++/32466] New: illegal loop store motion of bitfield
#include void __attribute__ ((__noinline__)) add14(unsigned short int nextID) { struct { unsigned short int id: 14; } hdr; hdr.id = nextID; do { hdr.id++; if (printf ("should print 0x: 0x%04X\n", (unsigned int)hdr.id)) break; } while (1); } main() { add14 (0x3FFF); return 0; } G++ miscompiles with optimization. GCC compiles correctly. Regression from 3.3. -- Summary: illegal loop store motion of bitfield Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stuart at apple dot com GCC build triplet: i386-apple-darwin9 GCC host triplet: i386-apple-darwin9 GCC target triplet: i386-apple-darwin9 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32466
[Bug target/21195] SSE intrinsics not inlined, sometimes.
--- Additional Comments From stuart at apple dot com 2005-06-29 16:49 --- I marked all the x86 vector intrinsics with always_inline, and this seems to fix both the testcases here. http://gcc.gnu.org/ml/gcc-cvs/2005-06/msg01059.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
[Bug c/19039] New: another SSE optimization ICE
Attached testcase ICEs GCC: [opel:/Volumes/sandbox/stuart] hasting2% gcc.fsf.debug.obj/gcc/xgcc -B gcc.fsf.debug.obj/gcc -O1 - msse2 -S m4.i m4.i: In function 'rrr': m4.i:36: error: unrecognizable insn: (insn 283 210 211 10 (set (reg:V16QI 22 xmm1) (const_int 1 [0x1])) -1 (nil) (nil)) m4.i:36: internal compiler error: in extract_insn, at recog.c:2020 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. --- Here is the testcase: --- typedef unsigned long ulong; typedef long long __v2di __attribute__ ((vector_size (16))); typedef char __v16qi __attribute__ ((vector_size (16))); typedef struct { void *v1; ulong kk; ulong ss; }bbbccc; long rrr(const bbbccc *od, ulong lc, ulong rc, ulong br, ulong kw) { long i, j, x, y, kx = kw, ky, p1 = (kx)/2, r1 = (ky)/2, kk = od->kk, q1 = lc - p1, sts = p1, lml = od- >ss - rc, g1, *pg1, i1, s1, j1; char *out = (char*) od->v1; __v2di h1; for( i = 0; i < kk; i++ ) { j1 = i + r1; if( i >= kk - (long) br ) j1 = ( kk-1) + r1 - (long) br; s1 = j1 - i1; for(; y < s1; y++ ) { for( x = q1; x < sts; x++ ) ; } out[j] = g1; for( ; j <= lml - 16; j += 16 ) { h1 = (__v2di)__builtin_ia32_pcmpeqb128 ((__v16qi)h1, (__v16qi)h1); for( y = 0; y < s1; y++ ) for( x = 0; x < (long) kw; x++ ) ; __builtin_ia32_storedqu ((char *)pg1, (__v16qi)h1); } } } -- Summary: another SSE optimization ICE Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stuart at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-apple-darwin GCC host triplet: i686-apple-darwin GCC target triplet: i686-apple-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19039
[Bug debug/19521] New: omitted stab for gcov initialization function
gcov support entails an initialization function named "__GLOBAL__I_0_noop". GCC omits function-begin stab for this function. Here is the commandline: [morris:/Volumes/sandbox/stuart] hasting2% \/Volumes/sandbox/stuart/gcc.fsf.obj/gcc/xgcc -B \/Volumes/sandbox/stuart/gcc.fsf.obj/gcc -g gcov.c -fprofile-arcs \ -ftest-coverage -S Given the .s file from the above, here is a check of the output: [morris:/Volumes/sandbox/stuart] hasting2% egrep 'noop|main' gcov.s .globl _noop _noop: .stabs "noop:F(0,1)=(0,1)",36,0,7,_noop .stabs "",36,0,0,Lscope0-_noop .globl _main _main: bl _noop .stabs "main:F(0,2)=r(0,2);-2147483648;2147483647;",36,0,11,_main .stabn 192,0,0,_main .stabs "",36,0,0,Lscope1-_main __GLOBAL__I_0_noop: .stabs "",36,0,0,Lscope2-__GLOBAL__I_0_noop .long __GLOBAL__I_0_noop .long __GLOBAL__I_0_noop [morris:/Volumes/sandbox/stuart] hasting2% The 'stabs "",36' record seems to signify the end-of-the __GLOBAL__I_0_noop function. The matching start-function record is missing; compare with the noop and main functions. The testcase is from the GCC testsuite: gcc/testsuite/gcc.misc-tests/gcov-1.c gcov.c Since it's short, here is the testcase: /* Test Gcov basics. */ /* { dg-options "-fprofile-arcs -ftest-coverage" } */ /* { dg-do run { target native } } */ void noop () { } int main () { int i; for (i = 0; i < 10; i++) /* count(11) */ noop ();/* count(10) */ return 0; /* count(1) */ } /* { dg-final { run-gcov gcov-1.c } } */ -- Summary: omitted stab for gcov initialization function Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: debug AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stuart at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: powerpc-apple-darwin GCC host triplet: powerpc-apple-darwin GCC target triplet: powerpc-apple-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19521
[Bug debug/19521] omitted stab for gcov initialization function
--- Additional Comments From stuart at apple dot com 2005-01-19 00:40 --- Created an attachment (id=7986) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7986&action=view) gcov-1.c testcase Attaching the testcase for convenience. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19521
[Bug debug/19521] omitted stab for gcov initialization function
--- Additional Comments From stuart at apple dot com 2005-01-19 00:49 --- This is a regression from 3.3; I think the cause is this line in cgraphunit.c (cgraph_build_static_cdtor): (approximately line 1847) DECL_IGNORED_P (decl) = 1; Deleting this line "fixes" the symptom, but I believe the right fix lies in dbxout.c. The start-function stab comes from dbxout.c(dbxout_symbol), near line 2346: /* Ignore nameless syms, but don't ignore type tags. */ if ((DECL_NAME (decl) == 0 && TREE_CODE (decl) != TYPE_DECL) || DECL_IGNORED_P (decl)) DBXOUT_DECR_NESTING_AND_RETURN (0); This check causes the omission of the start-function stab. The corresponding end-function stab comes from dbxout.c (dbxout_function_end), near line 465: #ifdef DBX_OUTPUT_NFUN DBX_OUTPUT_NFUN (asm_out_file, lscope_label_name, current_function_decl); #else =>fprintf (asm_out_file, "%s\"\",%d,0,0,", ASM_STABS_OP, N_FUN); assemble_name (asm_out_file, lscope_label_name); putc ('-', asm_out_file); assemble_name (asm_out_file, XSTR (XEXP (DECL_RTL (current_function_decl), 0), 0)); fprintf (asm_out_file, "\n"); #endif -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19521
[Bug debug/19521] [4.0 Regression] omitted stab for gcov initialization function
--- Additional Comments From stuart at apple dot com 2005-01-19 17:08 --- > So the bug is the end stab without the start stab? Yes. > Or do you think that this > bit of code that corresponds not at all to any user code should have full > stabs? My personal preference is a mild "yes." But I can forsee that others will disagree, and I recognize the validity of that position. > If the later, why? When I'm grubbing through a broken binary, it's helpful when the debugger tells me that this function body didn't come from the user's sourcecode. In general, "more information is better." I suppose the counterargument would be that most users don't look at the assembly code, don't want to know about these functions, and would prefer smaller debug information for faster linking and development. I assume that most GCC users are unlike me, this I infer their argument wins. I can live with that; this is not a big deal either way. If the debugger already knows the name of this function, and the stabs are not adding any useful information, then I agree they're a waste and should be omitted. The big deal is that the begin/end stabs should match, both emitted or both omitted. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19521
[Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch
When compiling the testcase (strcpy()) with -Os -march=pentium4, GCC generates a word-fetch instead of a byte-fetch. This can provoke memory faults (e.g. at the end of a page). Omitting "-march=pentium4" generates correct code. Here is the testcase: char * mystrcpy(char * __restrict to, const char * __restrict from) { char *save = to; for (; (*to = *from); ++from, ++to); return(save); } And the invocation: % gcc.fsf.pure.obj/gcc/xgcc -B gcc.fsf.pure.obj/gcc -Os -S mystrcpy.c -march=pentium4 And the result (error is marked): .file "mystrcpy.c" .text .globl mystrcpy .type mystrcpy, @function mystrcpy: pushl %ebp movl%esp, %ebp movl12(%ebp), %ecx movl8(%ebp), %edx jmp .L2 .L3: incl%ecx incl%edx .L2: movl(%ecx), %eax should be 'movb' movb%al, (%edx) testb %al, %al jne .L3 movl8(%ebp), %eax popl%ebp ret .size mystrcpy, .-mystrcpy .ident "GCC: (GNU) 4.0.0 20041015 (experimental)" .section.note.GNU-stack,"",@progbits -- Also reproducible on i686-apple-darwin. -- Summary: -march=pentium4 generates word fetch instead of byte fetch Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stuart at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-linux GCC host triplet: i686-pc-linux GCC target triplet: i686-pc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
[Bug c/18019] -march=pentium4 generates word fetch instead of byte fetch
--- Additional Comments From stuart at apple dot com 2004-10-15 18:05 --- Created an attachment (id=7359) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7359&action=view) testcase Attaching the testcase for covenience. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
[Bug target/18019] -march=pentium4 generates word fetch instead of byte fetch
--- Additional Comments From stuart at apple dot com 2004-10-15 18:27 --- The bug was discovered when it walked off the end of a VM page and faulted. Are you certain this is "expected behavior?" -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
[Bug c/17853] New: -O2 ICE for MMX testcase
gcc -O2 -mmmx i386-mmx-5.c GCC fails thus: - i386-mmx-5.c: In function 'main': i386-mmx-5.c:15: internal compiler error: in simplify_binary_operation, at simplify-rtx.c:2151 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. - /* { dg-do run { target i?86-*-* x86_64-*-* } } */ /* { dg-options "-O2 -mmmx" } */ #include #include __m64 global_mask; main() { __m64 zero = _mm_setzero_si64(); __m64 mask = _mm_cmpeq_pi8( zero, zero ); mask = _mm_unpacklo_pi8( mask, zero ); global_mask = mask; exit(0); } - -- Summary: -O2 ICE for MMX testcase Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stuart at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-linux-gnu GCC host triplet: i686-linux-gnu GCC target triplet: i686-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17853
[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
--- Additional Comments From stuart at apple dot com 2004-11-09 19:34 --- I agree with Roger. I'm suspicious of the "0,1,2 ... TARGET_PARTIAL_xx" clauses of the "*movqi_1" pattern. (Also the analogous parts of "*movhi_1".) I've tried reverting Roger's patch, and excising the TARGET_PARTIAL_xx clauses; either change appears to fix the problem. I've also successfully regression-tested the excision. I will invite Jan Hubicka, author of the TARGET_PARTIAL_xx clauses (i386.md, v1.503), to look at this. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
--- Additional Comments From stuart at apple dot com 2004-11-16 19:39 --- Here is the body of an email I sent to Jan Hubicka concerning this bug. In the body of the message, 'you' refers to Jan. -- For discussion, here is the pattern in question as it exists on the FSF mainline today: 1503 ;; Situation is quite tricky about when to choose full sized (SImode) move 1504 ;; over QImode moves. For Q_REG -> Q_REG move we use full size only for 1505 ;; partial register dependency machines (such as AMD Athlon), where QImode 1506 ;; moves issue extra dependency and for partial register stalls machines 1507 ;; that don't use QImode patterns (and QImode move cause stall on the next 1508 ;; instruction). 1509 ;; 1510 ;; For loads of Q_REG to NONQ_REG we use full sized moves except for partial 1511 ;; register stall machines with, where we use QImode instructions, since 1512 ;; partial register stall can be caused there. Then we use movzx. 1513 (define_insn "*movqi_1" 1514[(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m") 1515 (match_operand:QI 1 "general_operand" " q,qn,qm,q,rn,qm,qn"))] 1516"GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM" 1517 { 1518switch (get_attr_type (insn)) 1519 { 1520 case TYPE_IMOVX: 1521if (!ANY_QI_REG_P (operands[1]) && GET_CODE (operands[1]) != MEM) 1522 abort (); 1523return "movz{bl|x}\t{%1, %k0|%k0, %1}"; 1524 default: 1525if (get_attr_mode (insn) == MODE_SI) 1526 return "mov{l}\t{%k1, %k0|%k0, %k1}"; 1527else 1528 return "mov{b}\t{%1, %0|%0, %1}"; 1529 } 1530 } 1531[(set (attr "type") 1532 (cond [(ne (symbol_ref "optimize_size") (const_int 0)) 1533(const_string "imov") 1534 (and (eq_attr "alternative" "3") 1535 (ior (eq (symbol_ref "TARGET_PARTIAL_REG_STALL") 1536(const_int 0)) 1537(eq (symbol_ref "TARGET_QIMODE_MATH") 1538(const_int 0 1539(const_string "imov") 1540 (eq_attr "alternative" "3,5") 1541(const_string "imovx") 1542 (and (ne (symbol_ref "TARGET_MOVX") 1543 (const_int 0)) 1544 (eq_attr "alternative" "2")) 1545(const_string "imovx") 1546 ] 1547 (const_string "imov"))) 1548 (set (attr "mode") 1549(cond [(eq_attr "alternative" "3,4,5") 1550 (const_string "SI") 1551 (eq_attr "alternative" "6") 1552 (const_string "QI") 1553 (eq_attr "type" "imovx") 1554 (const_string "SI") 1555 (and (eq_attr "type" "imov") 1556(and (eq_attr "alternative" "0,1,2") 1557 (ne (symbol_ref "TARGET_PARTIAL_REG_DEPENDENCY") 1558 (const_int 0 1559 (const_string "SI") 1560 ;; Avoid partial register stalls when not using QImode arithmetic 1561 (and (eq_attr "type" "imov") 1562(and (eq_attr "alternative" "0,1,2") 1563 (and (ne (symbol_ref "TARGET_PARTIAL_REG_STALL") 1564 (const_int 0)) 1565 (eq (symbol_ref "TARGET_QIMODE_MATH") 1566 (const_int 0) 1567 (const_string "SI") 1568 ] 1569 (const_string "QI")))]) Roger added lines 1532-1533 in January of this year. It looks like you added lines 1555-1567 in 2000. The combination of lines 1532-1533 (use "imov" if -Os) and lines 1555-1559 (use SImode if "imov" and byte-load and K8/P4/Nocona) means we generate a "movl" that should be a "movb". (The testcase is strcpy(); see the Bugzilla.) For the following discussion, note that GCC currently matches "movqi_1" alternative #2 ("q" and "qm" in the attribute list) on the critical byte-fetch-from-memory in the strcpy() testcase. It appears to me that the 1555-1559 clause depends upon any CPU with
[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
--- Additional Comments From stuart at apple dot com 2004-12-02 01:07 --- Jan emailed this to me privately. Appended here for completeness. - stuart Just to clarify things a bit. TARGET_MOVX and TARGET_PARTIAL_REG_DEPENDENCY is not about supporting some feature but about a way the CPU deals with dependencies on partial registers. Some CPUs (Athlon+,P4+) deal with partial register writes as read-modify operation of the whole thing (TARGET_PARTIAL_REG_DEPENDENCT) and for some of these it is profitable to do dummy zero extend (TARGET_MOVX) instead of loads to avoid the dependency, while others (K6, P3) give it another internal name and don't see the false dependency (TARGET_PARTIAL_REG_STALL). On the other hand they get penalty if the result is used as a whole register. There is unlikely to be ever CPU spoiled up in both directions.. However the Roger's patch, as I understand it, is about avoiding movx as it encodes longer on -Os. It seems to me that for targets not defining TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we should always produce the straighforward movq as expected, while for TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use the full moves as long as they don't encode longer. I can't check right now, but i believe it is only the movl imm, register that comes out longer and that is the alternative 2. We can also probably kill the TARGET_QIMODE_MATH as it is no longer used. There is the type and mode argument not only to choose the particular instruction but also to drive scheduling (K6 for instance has limited supply of units that do 8bit operations). imovx is 32bit operation so it needs to get SImode. I would simply break out the alternative 2 from both conditionals and would additionally check optimize_size to be nonzero I can prepare patch later next week unless someone beats me :) Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
[Bug tree-optimization/24659] Conversions are not vectorized
--- Comment #3 from stuart at apple dot com 2007-01-05 18:27 --- Created an attachment (id=12862) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12862&action=view) vectorized assembly output from ICC v9.1 Generated from the "indefinite loop" variant of the testcase on OS X 10.4.7, using ICC v9.1: % icc -O2 -S 24659.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24659
[Bug tree-optimization/24659] Conversions are not vectorized
--- Comment #4 from stuart at apple dot com 2007-01-05 18:30 --- I ran the testcase through ICC, and it unrolled the loops without vectorizing them. However, making the loops indefinite gets us the desired, vectorized result. Here is the modified, indefinite loop version of the testcase: void test_fp (float *a, double *b, int count) { int i; for (i = 0; i < count; i++) b[i] = (double) a[i]; } void test_int (int *a, double *b, int count) { int i; for (i = 0; i < count; i++) b[i] = (double) a[i]; } (Note to Apple: this is Radar 4079267) -- stuart at apple dot com changed: What|Removed |Added CC| |stuart at apple dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24659