Re: GCC 4.3.0 Status Report (2007-08-09)
On 8/10/07 9:49 AM, Diego Novillo wrote: > Zadeck has the parloop branch patches [ ... ] Sorry, I meant Zdenek.
Re: [RFC] Migrate pointers to members to the middle end
Ollie Wild wrote: > Offhand, I don't remember what happened with the various other cases, > but my testing at the time wasn't particularly thorough. The feedback > I've gotten so far seems overwhelmingly negative, so I think the next > step is to revisit the lowering approach, exercise the hell out of it, > and see what, if any, limitations pop up. Yes, I agree. Again, thank you for being patient with the process. Let me know when you're at the point where you'd like me to review the front-end lowering patch again; send me a URL, and I'll be happy to do so. Thanks, -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: mips gcc -O1: Address exception error on store doubleword
Alex Gonzalez writes: > Hi, trying to come up with a testcase we figured out what the problem could > be. > > When the optimizer is on and memcpy sees that it is copying a > struct with double words in it, it will assume that the struct > starts on an 8 byte boundary and use double word loads and > stores. This is a safe assumption, as gcc will always ensure that > structs containing doubles start on an 8 byte boundary when the > memory is mallocced. > > However we managed to trick gcc by mallocing a large chunk of > memory and then assigning a pointer to a user data (unsigned int > user[0]) without first ensuring that the user data was 8 byte > aligned. Since the structure does contain a double, this resulted > in a crash in memcopy. > > The fix for this was to inform the compiler that this "void" > pointer should be 8 byte aligned by changing the "unsigned int > user[0]" to a "unsigned long long user[0]". This will cause gcc to > pad this entry out to ensure that it starts on an 8 byte boundary. > > Does this make sense? Yes. In general, if you lie to the compiler you lose. :-) It's a very good idea to read what the language standards actually say about this. In particular, casting pointers between types doesn't work except in some well-defined cases. You should read the standard to find out what works and what doesn't. Andrew.
Re: Very Fast: Directly Coded Lexical Analyzer
Ronny Peine wrote: Hi, my questions is, why not use the element construction algorithm? The Thomson Algorithm creates an epsilon-NFA which needs quite a lot of memory. The element construction creates an NFA directly and therefor has fewer states. Well, this is only interesting in the scanner creation which is not so important than the scanner itself, but it can reduce the memory footprint of generator. It's a pity i can't find a url for the algorithmdescription, maybe i even have the wrong naming of it. I have only read it in script Compiler Construction at the University. To me, very fast (millions of lines a second) lexical analyzers are trivial to write by hand, and I really don't see the point of tools, and certainly not the utility of any theory in writing such code. If anything the formalism of a finite state machine just gets in the way, since it is more efficient to encode the state in the code location than in data.
GCC "make" errors
Hi, I wanted update my GCC compiler to 4.2.1 to install an updated version of C libraries (glibc) and it is giving me errors while it is making the build. I type ./configure which works fine but when I type "make" it runs fine until it starts to give errors which are as follows: /tmp/ccacyMlE.s: Assembler messages: /tmp/ccacyMlE.s:72: Error: no such 386 instruction: `stmxcsr' /tmp/ccacyMlE.s:90: Error: no such 386 instruction: `ldmxcsr' /tmp/ccacyMlE.s:119: Error: no such 386 instruction: `fxsave' make[3]: *** [crtfastmath.o] Error 1 make[3]: Leaving directory `/usr/src/gcc-4.2.1/host-i686-pc-linux-gnu/gcc' make[2]: *** [all-stage1-gcc] Error 2 make[2]: Leaving directory `/usr/src/gcc-4.2.1' make[1]: *** [stage1-bubble] Error 2 make[1]: Leaving directory `/usr/src/gcc-4.2.1' make: *** [all] Error 2 I have latest versions of make 3.81, binutils, coreutils, texinfo installed. I am running Linux JDS 2003 which I have been told is SUSE Linux on a Athlon 1.6Ghz. It seems Linux users on linux forums have limited knowledge of this, as I have not recieved any assistance from them so your help would be really appreciated. Thanks Mandeep -- We've Got Your Name at http://www.mail.com ! Get a FREE E-mail Account Today - Choose From 100+ Domains
Re: GCC "make" errors
[EMAIL PROTECTED] wrote: Hi, I wanted update my GCC compiler to 4.2.1 to install an updated version of C libraries (glibc) and it is giving me errors while it is making the build. I type ./configure which works fine but when I type "make" it runs fine until it starts to give errors which are as follows: /tmp/ccacyMlE.s: Assembler messages: /tmp/ccacyMlE.s:72: Error: no such 386 instruction: `stmxcsr' /tmp/ccacyMlE.s:90: Error: no such 386 instruction: `ldmxcsr' /tmp/ccacyMlE.s:119: Error: no such 386 instruction: `fxsave' make[3]: *** [crtfastmath.o] Error 1 make[3]: Leaving directory `/usr/src/gcc-4.2.1/host-i686-pc-linux-gnu/gcc' make[2]: *** [all-stage1-gcc] Error 2 make[2]: Leaving directory `/usr/src/gcc-4.2.1' make[1]: *** [stage1-bubble] Error 2 make[1]: Leaving directory `/usr/src/gcc-4.2.1' make: *** [all] Error 2 I have latest versions of make 3.81, binutils, coreutils, texinfo installed. I am running Linux JDS 2003 which I have been told is SUSE Linux on a Athlon 1.6Ghz. It seems Linux users on linux forums have limited knowledge of this, as I have not recieved any assistance from them so your help would be really appreciated. Either you don't have a binutils from the last 8 years, or you have somehow crossed up your march= options, which you didn't divulge.
Re: [RFC] Migrate pointers to members to the middle end
> "Dan" == Daniel Berlin <[EMAIL PROTECTED]> writes: Dan> Just to be clear, we *already* have the class hierarchies in the Dan> middle end. Dan> They have been there for a few years now :) Good point, thanks. I don't think that is enough though, because I don't think the BINFO slots mean the same thing in g++ and gcj. Anyway, I don't want to derail this conversation. If we really want to strength reduce interface dispatch to virtual dispatch in LTO then we'll need to find some relatively language neutral way to express that. Tom
Re: GCC 4.3.0 Status Report (2007-08-09)
On 8/9/07 6:19 PM, Mark Mitchell wrote: > Are there any folks out there who have projects for Stage 1 or Stage 2 > that they are having trouble getting reviewed? Any comments > re. timing for Stage 3? Zadeck has the parloop branch patches, which I've been reviewing. I am not sure how many other patches are left, but at least a couple. Zdenek are the remaining patches submitted already? I have one in my review list, but I don't know if there are others. I could go over them next week.
Re: [RFC] Migrate pointers to members to the middle end
Hi, On Thu, 9 Aug 2007, Tom Tromey wrote: Michael> Yes, devirtualization. But I wonder if you really need class Michael> hierarchies for this (actually I'm fairly sure you don't). However, I'm not sure I agree with the above assertion. Specifically, for Java I think it is sometimes possible to strength reduce interface calls to virtual calls, but I don't see how this could be done without class hierarchy information. Okay, I suppose there are transformations that could make use of class hierarchies. Luckily we do have that via the BINFO machinery already. Ciao, Michael.
RE: Very Fast: Directly Coded Lexical Analyzer
On 10 August 2007 12:49, Robert Dewar wrote: On 01 June 2007 11:27, Ronny Peine wrote: >> Hi, >> >> my questions is, why not use the element construction algorithm? > To me, very fast (millions of lines a second) lexical analyzers are > trivial to write by hand, I think you need one to lex the dates in the old back-dated emails in your mailbox for you! :-) cheers, DaveK -- Can't think of a witty .sigline today
reload question
I'm looking into a few cases where we're still getting the base/index operand ordering wrong on PowerPC for an indexed load/store instruction, even after the PTR_PLUS merge and fix for PR28690. One of the cases I observed was caused by reload picking r0 to use for the base reg opnd as a result of spilling. Since r0 is not a valid register for the base reg position, we end up switching the order of the operands before emitting the instruction which then causes the performance hit on Power6. r0 is not a valid BASE_REG_CLASS register, only INDEX_REG_CLASS, but the following section of code from reload.c:find_reloads_address_1() dealing with PLUS(REG REG) may try assigning the base reg opnd to the INDEX_REG class in a couple situations. This then allows r0 to be picked for the base reg opnd. Is this being done on purpose (going on assumption that operands are commutative), such as to allow more opportunities for a successful allocation with reduced spill? If it's not wise for me to modify this code, possibly due to effect on other architectures, what are some other options (maybe introduce a new HONOR_BASE_INDEX_ORDER target macro)? else if (code0 == REG && code1 == REG) { if (REGNO_OK_FOR_INDEX_P (REGNO (op0)) && regno_ok_for_base_p (REGNO (op1), mode, PLUS, REG)) return 0; else if (REGNO_OK_FOR_INDEX_P (REGNO (op1)) && regno_ok_for_base_p (REGNO (op0), mode, PLUS, REG)) return 0; else if (regno_ok_for_base_p (REGNO (op1), mode, PLUS, REG)) find_reloads_address_1 (mode, orig_op0, 1, PLUS, SCRATCH, &XEXP (x, 0), opnum, type, ind_levels, insn); else if (regno_ok_for_base_p (REGNO (op0), mode, PLUS, REG)) find_reloads_address_1 (mode, orig_op1, 1, PLUS, SCRATCH, &XEXP (x, 1), opnum, type, ind_levels, insn); else if (REGNO_OK_FOR_INDEX_P (REGNO (op1))) find_reloads_address_1 (mode, orig_op0, 0, PLUS, REG, &XEXP (x, 0), opnum, type, ind_levels, insn); else if (REGNO_OK_FOR_INDEX_P (REGNO (op0))) find_reloads_address_1 (mode, orig_op1, 0, PLUS, REG, &XEXP (x, 1), opnum, type, ind_levels, insn); else { find_reloads_address_1 (mode, orig_op0, 1, PLUS, SCRATCH, &XEXP (x, 0), opnum, type, ind_levels, insn); find_reloads_address_1 (mode, orig_op1, 0, PLUS, REG, &XEXP (x, 1), opnum, type, ind_levels, insn); } } I've also seen the same situation come up during register renaming (regrename.c), but not too surprising since the code there says it's based off find_reloads_address_1() and is coded similarly. -Pat
Re: mips gcc -O1: Address exception error on store doubleword
Hi, trying to come up with a testcase we figured out what the problem could be. When the optimizer is on and memcpy sees that it is copying a struct with double words in it, it will assume that the struct starts on an 8 byte boundary and use double word loads and stores. This is a safe assumption, as gcc will always ensure that structs containing doubles start on an 8 byte boundary when the memory is mallocced. However we managed to trick gcc by mallocing a large chunk of memory and then assigning a pointer to a user data (unsigned int user[0]) without first ensuring that the user data was 8 byte aligned. Since the structure does contain a double, this resulted in a crash in memcopy. The fix for this was to inform the compiler that this "void" pointer should be 8 byte aligned by changing the "unsigned int user[0]" to a "unsigned long long user[0]". This will cause gcc to pad this entry out to ensure that it starts on an 8 byte boundary. Does this make sense? Alex On 8/9/07, Alex Gonzalez <[EMAIL PROTECTED]> wrote: > Hi, > > I'll try to come up with a short test. > > I have narrowed it a bit more. The PVAR structure contains a long long > variable ( with a sizeof 8 and an alignof 8 for my architecture). If I > take out the long long variable, the compiler uses sdl instructions > instead of sd and the exception doesn't happen. > > Also, if I do > > static void varcopy(void *pvar1, void *pvar2) > > the compiler uses sdl and avoids the crash. > > I am compiling for n32 ABI, so the register size is 64bits. > > Any ideas? > > On 8/9/07, David Daney <[EMAIL PROTECTED]> wrote: > > Alex Gonzalez wrote: > > > Hi, > > > > > > I am seeing an address error exception caused by the gcc optimizer -O1. > > > > > > I have narrowed it down to the following function: > > > > > > static void varcopy(PVAR *pvar1, PVAR *pvar2) { > > > memcpy(pvar1,pvar2,sizeof(PVAR)); > > > } > > > > > > Being the sizeof(PVAR) 160 bytes. > > > > > > The exception is caused on an sd instruction when the input is not > > > aligned on a doubleword boundary. > > > > > > I was under the assumption that the compiler made sure that it doesn't > > > store a doubleword that is not aligned on a doubleword boundary. Is > > > this a bug in the optimizer? > > > > > > I am using a gcc mips64 cross-compiler, > > > > > > mips64-linux-gnu-gcc (GCC) 3.3-mips64linux-031001 > > > > > > Has anyone experienced this problem before? > > > > > In order to investigate we would need a self contained test case (i.e. > > the definition of PVAR must be included). Also it would be nice if you > > could try it on a current version of GCC (4.2.1 perhaps). > > > > David Daney > > >
gcc-4.3-20070810 is now available
Snapshot gcc-4.3-20070810 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20070810/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.3 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 127352 You'll find: gcc-4.3-20070810.tar.bz2 Complete GCC (includes all of below) gcc-core-4.3-20070810.tar.bz2 C front end and core compiler gcc-ada-4.3-20070810.tar.bz2 Ada front end and runtime gcc-fortran-4.3-20070810.tar.bz2 Fortran front end and runtime gcc-g++-4.3-20070810.tar.bz2 C++ front end and runtime gcc-java-4.3-20070810.tar.bz2 Java front end and runtime gcc-objc-4.3-20070810.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.3-20070810.tar.bz2The GCC testsuite Diffs from 4.3-20070803 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.3 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: reload question
Pat Haugen <[EMAIL PROTECTED]> writes: > I'm looking into a few cases where we're still getting the base/index > operand ordering wrong on PowerPC for an indexed load/store instruction, > even after the PTR_PLUS merge and fix for PR28690. One of the cases I > observed was caused by reload picking r0 to use for the base reg opnd as a > result of spilling. Since r0 is not a valid register for the base reg > position, we end up switching the order of the operands before emitting the > instruction which then causes the performance hit on Power6. r0 is not a > valid BASE_REG_CLASS register, only INDEX_REG_CLASS, but the following > section of code from reload.c:find_reloads_address_1() dealing with > PLUS(REG REG) may try assigning the base reg opnd to the INDEX_REG class in > a couple situations. This then allows r0 to be picked for the base reg > opnd. Is this being done on purpose (going on assumption that operands are > commutative), such as to allow more opportunities for a successful > allocation with reduced spill? If it's not wise for me to modify this > code, possibly due to effect on other architectures, what are some other > options (maybe introduce a new HONOR_BASE_INDEX_ORDER target macro)? I'm not entirely clear: how do you propose changing the code? Ian
RFC: Simplify rules for ctz/clz patterns and RTL
During development of the patch I just posted for double-word clz, I went through all the back ends and audited their use of the bit-scan named patterns and RTL. It appears to me that our current handling of C[LT]Z_DEFINED_VALUE_AT_ZERO is much more complicated than it needs to be, and also that between my patch and Sandra's earlier patch for synthetic ctz/ffs, we have an opportunity to delete a bunch of code from the back ends. In this message, I'll use the word "instruction" when I am talking about an actual hardware operation on a particular architecture; the word "pattern" when I am talking about a named define_insn or define_expand in a machine description; and the word "expression" when I am talking about RTL. The word "port" refers to the GCC back-end for a particular CPU architecture. There are eleven ports that make use of an clz instruction. That use is not necessarily in a clz pattern or with clz expressions - some only define ffs patterns, and some use UNSPECs. This is mostly irrelevant to what I want to talk about, though. alpha arm i386 m68k mips rs6000 s390 score sh sparc xtensa Of these, the majority have instructions that, when the input is zero, write to the output a value equal to the number of bits in the input (i.e. GET_MODE_BITSIZE of the mode of the input). I'll refer to this as canonical behavior. Furthermore, these ports set CLZ_DEFINED_VALUE_AT_ZERO to reflect that fact. alpha arm m68k mips rs6000 s390 xtensa The score, sh and sparc instructions may or may not display canonical behavior; their ports do not define CLZ_DEFINED_VALUE_AT_ZERO and I was not able to find documentation of the relevant instruction. i386, as is well known, has a clz instruction that does not write a predictable value to the output when the input is zero, and so correctly does not define CLZ_DEFINED_VALUE_AT_ZERO. (Actually, when TARGET_ABM is true, we are using a new instruction that *does* display canonical behavior, and my aforementioned patch sets CLZ_DEFINED_VALUE_AT_ZERO to reflect that; but again this is mostly irrelevant.) No port needs CLZ_DEFINED_VALUE_AT_ZERO to be a tristate. Either both or neither of the clz pattern and the clz expression produce a defined value at zero. No port defines CLZ_DEFINED_VALUE_AT_ZERO to set the 'val' argument to anything other than GET_MODE_BITSIZE (mode). [Some of them hardcode the constant instead of using that expression.] There are two ports that make use of a ctz instruction: alpha i386 alpha's instruction displays canonical behavior; i386's instruction does not write a predictable value to the output when the input is zero (TARGET_ABM does not help here). Both ports have correct definitions or non-definitions of CTZ_DEFINED_VALUE_AT_ZERO. In addition, four ports define ctz patterns that expand to multi-instruction sequences. arm ia64 rs6000 xtensa Of these, all except ia64 are presently redundant with the generic expander Sandra added to optabs.c. ia64 generates a different sequence involving a popcount instruction; it would be easy enough to add that to optabs.c. CTZ_DEFINED_VALUE_AT_ZERO is not defined by all four ports, but could be. Those that define it, do so correctly. No port needs CTZ_DEFINED_VALUE_AT_ZERO to be a tristate. There are three cases: both the pattern and the expression have a defined value at zero (alpha); neither the pattern nor the expression has a defined value at zero (i386); the pattern has a defined value at zero and the expression is never emitted so its value at zero is moot (arm, rs6000, xtensa, ia64). If optabs.c were taught to synthesize ctz in terms of popcount, the arm, rs6000, xtensa, and ia64 definitions of ctz patterns could all be removed. There would then be no port that defined CTZ_DEFINED_VALUE_AT_ZERO to set 'val' to anything other than GET_MODE_BITSIZE (mode). [My patch removes the arm ctz pattern. rs6000 and xtensa could be removed now.] There is no port that makes use of an ffs instruction. However, there are nine architectures that define ffs patterns. alpha arm i386 ia64 rs6000 score sh sparc xtensa All except ia64's are redundant with optabs.c after Sandra's patch plus my patch. ia64's would be redundant if the aforementioned popcount sequence were added to optabs.c. There is no port that uses the ffs expression. ffs always has a defined value at zero, so there is no FFS_DEFINED_VALUE_AT_ZERO macro nor any need for one. The machine-independent uses of C[LT]Z_DEFINED_VALUE_AT_ZERO are quite limited: * builtins.c (fold_builtin_bitop): Uses them to determine the value of __builtin_clz* and __builtin_ctz* for a zero argument. Interestingly, if the macros are false for a given mode, it folds the builtins as if they displayed canonical behavior. * optabs.c: Uses them in strategies for expanding ctz and ffs. * rtlanal.c (nonzero_bits1): Uses them to decide what bits can be nonzero in the result of a clz or ctz expression. * simplify-rtx.