Re: GCC optimizes integer overflow: bug or feature?
int j; for (j = 1; 0 < j; j *= 2) if (! bigtime_test (j)) return 1; Here it is obvious to a programmer that the comparison is intended to do overflow checking, even though the test controls the loop. Well, it's not to me. :-) Another question for the GCC experts: would it fix the bug if we replaced "j *= 2" with "j <<= 1" in this sample code? Yes, it will actually compile the code as this: int i, j; for (i = 0, j = 1; i < 31; i++) j <<= 1; Or you can do, since elsewhere in the code you compute time_t_max: for (j = 1; j <= time_t_max / 2 + 1; j *= 2) Which is IMO more intention revealing. Paolo
Re: GCC optimizes integer overflow: bug or feature?
Paul Eggert wrote: Roberto Bagnara <[EMAIL PROTECTED]> writes: (The platform I'm thinking of is Tandem NSK/OSS.) Is this correct? Doesn't C99's 6.2.5#6 mandate that... This is straying from the subject of GCC and into the problems of writing portable C code, but since you asked The Tandem NSK/OSS environment does not claim full conformance to C99. The NSK/OSS community is conservative (fault-tolerance does that do you :-) and has introduced only some C99 features, more as time progresses. The NSK/OSS developers did not introduce 64-bit unsigned int until last year. I'm no expert in the area, but I'd guess that most NSK/OSS production shops are still running older releases, which have 64-bit signed int but only 32-bit unsigned int. I now understand that Tandem NSK/OSS is not conformant, thanks. But he reason I asked is that I interpreted what you wrote, i.e., > Also, such an approach assumes that unsigned long long int > has at least as many bits as long long int. But this is an > unportable assumption; C99 does not require this. as "C99 does not require that unsigned long long int has at least as many bits as long long int." My reading, instead, is that C99 requires unsigned long long int to have exactly the same number of bits as long long int. All the best, Roberto -- Prof. Roberto Bagnara Computer Science Group Department of Mathematics, University of Parma, Italy http://www.cs.unipr.it/~bagnara/ mailto:[EMAIL PROTECTED]
Re: GCC optimizes integer overflow: bug or feature?
Or you can do, since elsewhere in the code you compute time_t_max: for (j = 1; j <= time_t_max / 2 + 1; j *= 2) No, this does not work. It would work to have: for (j = 1;;) { if (j > time_t_max / 2) break; j *= 2; } Oops. Paolo
needed headerfiles in .c
Dear GCC Developers / Users, I am trying to port a GCC-Backend from GCC 2.7.2.3 to GCC 4.1.1. After having had a look on some already existing backends like the PDP11, I found out that there have been a lot of new Header-Files added to ".c" as includes. My question is now whether some kind of standard set of Header-Files exists which is needed by every backend? Can somebody give me a list or something like that. I had already a look at the Internals Manual but without finding something about it. Thanks in advance, Markus Franke
Fixme in driver-i386.c
Hello! There is a fixme in config/i386/driver-i386.c: --cut here-- if (arch) { /* FIXME: i386 is wrong for 64bit compiler. How can we tell if we are generating 64bit or 32bit code? */ cpu = "i386"; } else --cut here-- Couldn't simple "sizeof(long)" do the trick here, i.e.: --cut here-- int main() { int i = sizeof (long); switch (i) { default: abort(); case 4: case 8: printf ("%i\n", i); } return 0; } --cut here-- gcc -m32 ./a.out 4 gcc -m64 ./a.out 8 Uros.
Re Fixme in driver-i386.c
Hello Uros, no the sizeof long is not always different. E.g. for future target 64bit mingw the long type remains 4-byte size. But may we can use the pointer-size ? Because on i386 32-bit system sizeof(void *)==4 and on x86_64 64-bit system sizeof(void *)==8 ! Regards, i.A. Kai Tietz Uros Bizjak <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 22.12.2006 12:24 To GCC cc Subject Fixme in driver-i386.c Hello! There is a fixme in config/i386/driver-i386.c: --cut here-- if (arch) { /* FIXME: i386 is wrong for 64bit compiler. How can we tell if we are generating 64bit or 32bit code? */ cpu = "i386"; } else --cut here-- Couldn't simple "sizeof(long)" do the trick here, i.e.: --cut here-- int main() { int i = sizeof (long); switch (i) { default: abort(); case 4: case 8: printf ("%i\n", i); } return 0; } --cut here-- gcc -m32 ./a.out 4 gcc -m64 ./a.out 8 Uros.
Saving the Tree declaration node in GCC 4.1.1.
Hi all, I am working with GCC 4.1.1. I need some information on the following Before emitting a call instruction, i need to check for function attributes. Based on that i need to emit the corresponding call instruction. For that, before emitting the call instruction, i check for the attributes of the called function through the declaration node. tree fn_id, fn_decl; fn_id = get_identifier(name); fn_decl = lookup_name(fn_id); This code works fine if none of the optimizations are enabled. But this fails for all levels of optimization as i am not able to get the function decl tree(lookup_name return NULL). When compiling the code with -fno-funit-at-a-time switch, i am able to get the function declaration node. The ASTs are converted into the SSA and eventually to the RTL representations after parsing each function (O0 - No optimization )or when the whole file is parsed (with optimization). When the whole file is parsed, is it that the Function declaration nodes are removed intentionally by GCC ? In this context (with -funit-at-a-time), how will i be able to access the function declaration node of the called function just before emitting the assembly ? Or should i save the function declaration nodes for further use? can any one suggest a workaround!! Thanks in advance, Regards, Rohit
Re: SSA_NAMES: should there be an unused, un-free limbo?
Hello, > On Thu, 2006-12-21 at 20:18 +0100, Zdenek Dvorak wrote: > > > I think this might be a good idea. I think that this requires > > a lot of changes (basically going through all uses of bsi_remove > > and remove_phi_node and checking them), but it would be cleaner > > than the current situation. > Agreed. Tedious work, but it shouldn't be terribly difficult (famous > last words). I will give it a try (i.e., if it can be done in one afternoon, I will send a patch tomorrow :-). Zdenek
Re: SSA_NAMES: should there be an unused, un-free limbo?
Jeffrey Law wrote on 12/22/06 01:09: On Thu, 2006-12-21 at 14:05 -0500, Diego Novillo wrote: In any case, that is not important. I agree that every SSA name in the SSA table needs to have a DEF_STMT that is either (a) an empty statement, or, (b) a valid statement still present in the IL. Just to be 100% clear. This is not true at the current time; see the discussion about the sharing of a single field for TREE_CHAIN and SSA_NAME_DEF_STMT. If you want to make that statement true, then you need to fix both the orphan problem and the sharing of a field for SSA_NAME_DEF_STMT and TREE_CHAIN. I think we are agreeing violently.
RE: Char shifts promoted to int. Why?
On 21 December 2006 21:54, Ayal Zaks wrote: >> Something along these lines may be useful to do in the vectorizer when we >> get code like this: > ((char)x) = ((char)( ((int)((char)x)) << >> ((int)c) ) ) >> and don't feel like doing all the unpacking of chars to ints and then >> packing the ints to chars after the shift. An alternative could be to >> transform the above pattern to: >> char_x1 = 0 >> char_x2 = char_x << c >> char_x = ((int)c < size_of_char) ? char_x2 : char_x1 >> and vectorize that (since we already know how to vectorize selects). > > Alternatively, do > char_c2 = (c < size_of_char ? c : 0) > char_x2 = char_x << char_c2 > which is like saturating the shift amount. You don't really mean zero as the third operand of that ternary operator, you want size_of_char. cheers, DaveK -- Can't think of a witty .sigline today
Re: Reload Pass
Rajkishore Barik wrote: Hi, Thanks very much. I still have doubts on your suggestion: AFAIK, the back-end pass consists of (in order) : reg move -> insn sched -> reg class -> local alloc -> global alloc -> reload -> post-reload. There are edges from reg move to reg class and reload back to global alloc. In case I want to implement a linear scan which may split live ranges (pseudos) into live intervals(smaller pseudos) and allocate different registers for each of them. This process would break the whole loop. So, what did you mean by --- "run this pass in between the register allocator and reload, that would probably be doable."? Sorry, you were not specific. When I read your first message, I also thought that you were going to rewrite the reload pass in other words to use another algorithm make strict RTL when there are only hard registers and all insn constraints are satisfied. As I understood correctly now, you are going to implement linear scan register allocator which also does what reload in gcc does. The first of all you did not stated what are your project goals. Is it better understanding how gcc register allocator and reload work (that is a good project then) or you want to write a better register allocator which will be used in gcc. The second goal is hard to achieve because linear scan register allocator generates worse code than Chaitin-Briggs and the current gcc register allocator (Chow's priority based coloring). Another problem is that the linear scan register allocation is patented. Do you have permission to use it in gcc? Currently we have permission to Chaitin's (IBM) and Brigg's (Rice university) patents. We can not ignore patents as LLVM does not worry about patents which uses petented linear scan and Callahan-Koblentz register allocators. IMHO, Ian wrote correctly that you can write a decent register allocator which does also reload stuff for one target (although even in this case number of details to keep in mind will be overwhelming) but it is a hard to write a decent code which works for all gcc targets. So many efforts were used to improve reload for all gcc processors (even weird ones like SH with many registers but very small displacements or mcore with few registers and small displacements), you should do the same to be successful. Reload does a lot of things not mentioned in Zack's document like register rematerialziation, virtual register elimination, address inheritance (which is important for processor with small address displacements) and few others. There is opinion that doing some reload things before the register allocation will permit to generate a better code because register allocation will not worry about the constraints and reload will be more predictable (more accurately will work in less cases). Such project already exists (see svn branches). In any case your work on register allocation and reload will be appreciated by the community only please be prepared that it will be not an easy way. Vlad
RE: GCC optimizes integer overflow: bug or feature?
On 22 December 2006 00:59, Denis Vlasenko wrote: > Or this, absolutely typical C code. i386 arch can compare > 16 bits at a time here (luckily, no alighment worries on this arch): Whaddaya mean, no alignment worries? Misaligned accesses *kill* your performance! I know this doesn't affect correctness, but the coder might well have known that the pointer is unaligned and written two separate byte-sized accesses on purpose; volatile isn't the answer because it's too extreme, there's nothing wrong with caching these values in registers and they don't spontaneously change on us. cheers, DaveK -- Can't think of a witty .sigline today
RE: GCC optimizes integer overflow: bug or feature?
On Fri, 2006-12-22 at 17:08 +, Dave Korn wrote: > Misaligned accesses *kill* your performance! Maybe on x86, but on PPC, at least for the (current) Cell's PPU misaligned accesses for most cases unaligned are optimal. Thanks, Andrew Pinski
Re: GCC optimizes integer overflow: bug or feature?
Dave Korn wrote: On 22 December 2006 00:59, Denis Vlasenko wrote: Or this, absolutely typical C code. i386 arch can compare 16 bits at a time here (luckily, no alighment worries on this arch): Whaddaya mean, no alignment worries? Misaligned accesses *kill* your performance! is it really worse to do one unaligned 16-bit read, than two separate 8-bit reads? I am surprised ... and of course you have the gain from shorter code, reducing i-cache pressure. I know this doesn't affect correctness, but the coder might well have known that the pointer is unaligned and written two separate byte-sized accesses on purpose; volatile isn't the answer because it's too extreme, there's nothing wrong with caching these values in registers and they don't spontaneously change on us.
Re: GCC optimizes integer overflow: bug or feature?
Andrew Pinski wrote: On Fri, 2006-12-22 at 17:08 +, Dave Korn wrote: Misaligned accesses *kill* your performance! Maybe on x86, but on PPC, at least for the (current) Cell's PPU misaligned accesses for most cases unaligned are optimal. is that true across cache boundaries? Thanks, Andrew Pinski
Re: GCC optimizes integer overflow: bug or feature?
On Fri, 2006-12-22 at 12:30 -0500, Robert Dewar wrote: > > > Maybe on x86, but on PPC, at least for the (current) Cell's PPU > > misaligned accesses for most cases unaligned are optimal. > > is that true across cache boundaries? For Cell, crossing the 32byte boundary causes the microcode to happen. But the question is how often does that happen compare to non crossing, I am willing to bet hardly at all, yes I need to test this and I am going to have anyways for my job :). -- Pinski
[mem-ssa] Updated documentation
I've updated the document describing Memory SSA. The section on mixing static and dynamic partitioning is still being implemented, so it's a bit sparse on details and things will probably shift somewhat before I'm done. http://gcc.gnu.org/wiki/mem-ssa Feedback welcome. Thanks.
gcc-4.1-20061222 is now available
Snapshot gcc-4.1-20061222 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20061222/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.1 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch revision 120156 You'll find: gcc-4.1-20061222.tar.bz2 Complete GCC (includes all of below) gcc-core-4.1-20061222.tar.bz2 C front end and core compiler gcc-ada-4.1-20061222.tar.bz2 Ada front end and runtime gcc-fortran-4.1-20061222.tar.bz2 Fortran front end and runtime gcc-g++-4.1-20061222.tar.bz2 C++ front end and runtime gcc-java-4.1-20061222.tar.bz2 Java front end and runtime gcc-objc-4.1-20061222.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.1-20061222.tar.bz2The GCC testsuite Diffs from 4.1-20061215 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.1 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: GCC optimizes integer overflow: bug or feature?
On Friday 22 December 2006 03:03, Paul Brook wrote: > On Friday 22 December 2006 00:58, Denis Vlasenko wrote: > > On Tuesday 19 December 2006 23:39, Denis Vlasenko wrote: > > > There are a lot of 100.00% safe optimizations which gcc > > > can do. Value range propagation for bitwise operations, for one > > > > Or this, absolutely typical C code. i386 arch can compare > > 16 bits at a time here (luckily, no alighment worries on this arch): > > > > int f(char *p) > > { > > if (p[0] == 1 && p[1] == 2) return 1; > > return 0; > > } > > Definitely not 100% safe. p may point to memory that is sensitive to the > access width and/or number of accesses. (ie. memory mapped IO). Take a look what Linux does when you need to touch a MMIO or PIO areas. In short: it wraps it in macros/inlines which do all required magic (which may be rather different on different architectures. For i386, they amount to *(volatile char*)p). "Simple" access to such areas with *p will never work safely across all spectrum of hardware. Ok, next example of real-world code I recently saw. i >= N comparisons are completely superfluous - programmer probably overlooked that fact. But gcc didn't notice that either and generated 16 bytes extra for first function: # cat t3.c int i64c(int i) { if (i <= 0) return '.'; if (i == 1) return '/'; if (i >= 2 && i < 12) return ('0' - 2 + i); if (i >= 12 && i < 38) return ('A' - 12 + i); if (i >= 38 && i < 63) return ('a' - 38 + i); return 'z'; } int i64c_2(int i) { if (i <= 0) return '.'; if (i == 1) return '/'; if (i < 12) return ('0' - 2 + i); if (i < 38) return ('A' - 12 + i); if (i < 63) return ('a' - 38 + i); return 'z'; } # gcc -O2 -c -fomit-frame-pointer t3.c # nm --size-sort t3.o 0038 T i64c_2 0048 T i64c # gcc -O2 -S -fomit-frame-pointer t3.c # cat t3.s .file "t3.c" .text .p2align 2,,3 .globl i64c .type i64c, @function i64c: movl4(%esp), %edx testl %edx, %edx jle .L15 cmpl$1, %edx je .L16 leal-2(%edx), %eax cmpl$9, %eax jbe .L17 leal-12(%edx), %eax cmpl$25, %eax jbe .L18 leal-38(%edx), %eax cmpl$24, %eax ja .L19 leal59(%edx), %eax ret .p2align 2,,3 .L17: leal46(%edx), %eax ret .p2align 2,,3 .L16: movl$47, %eax ret .p2align 2,,3 .L19: movl$122, %eax ret .L18: leal53(%edx), %eax ret .L15: movl$46, %eax ret .size i64c, .-i64c .p2align 2,,3 .globl i64c_2 .type i64c_2, @function i64c_2: movl4(%esp), %eax testl %eax, %eax jle .L33 cmpl$1, %eax je .L34 cmpl$11, %eax jle .L35 cmpl$37, %eax jle .L36 cmpl$62, %eax jg .L37 addl$59, %eax ret .p2align 2,,3 .L35: addl$46, %eax ret .p2align 2,,3 .L34: movb$47, %al ret .p2align 2,,3 .L37: movl$122, %eax ret .L36: addl$53, %eax ret .L33: movl$46, %eax ret .size i64c_2, .-i64c_2 .ident "GCC: (GNU) 4.2.0 20061128 (prerelease)" .section.note.GNU-stack,"",@progbits -- vda
Re: Saving the Tree declaration node in GCC 4.1.1.
"Rohit Arul Raj" <[EMAIL PROTECTED]> writes: > Before emitting a call instruction, i need to check for function > attributes. Based on that i need to emit the corresponding call > instruction. For that, before emitting the call instruction, i check > for the attributes of the called function through the declaration > node. > > tree fn_id, fn_decl; > fn_id = get_identifier(name); > fn_decl = lookup_name(fn_id); I don't understand where you are trying to emit the call instruction. Why do you only have the name? What language are you compiling? Certainly calling lookup_name seems wrong. Given a CALL_EXPR, you can use get_callee_fndecl to find the caller. At the RTL level, you can usually use TARGET_ENCODE_SECTION_INFO and SYMBOL_REF_FLAG to good effect. Or look at how ARM handles ENCODED_LONG_CALL_ATTR_P via TARGET_STRIP_NAME_ENCODING. Ian