Re: return void from void function is allowed.
On Oct 31, 2006, at 12:49 PM, Igor Bukanov wrote: -- Forwarded message -- From: Igor Bukanov <[EMAIL PROTECTED]> Date: Oct 31, 2006 9:48 PM Subject: Re: return void from void function is allowed. To: Mike Stump <[EMAIL PROTECTED]> On 10/31/06, Mike Stump <[EMAIL PROTECTED]> wrote: This is valid in C++. My copy of 1997 C++ public draft contains: 6.6.3 The return statement ... 2 A return statement without an expression can be used only in functions that do not return a value, that is, a function with the return value type void, a constructor (_class.ctor_), or a destructor (_class.dtor_). A return statement with an expression can be used only in functions returning a value; the value of the expression is returned to the caller of the function. If required, the expression is implicitly converted to the return type of the function in which it appears. A return statement can involve the construction and copy of a temporary object (_class.temporary_). Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior in a value-returning function. My reading of that is C++ does not allow return void-expression from void function. Was it changed later? Yes, it was: [stmt.return] 6.6.3 The return statement ... 2 A return statement without an expression can be used only in functions that do not return a value, that is, a function with the return type void, a constructor (12.1), or a destructor (12.4). A return statement with an expression of non-void type can be used only in functions returning a value; the value of the expression is returned to the caller of the function. The expression is implicitly converted to the return type of the func- tion in which it appears. A return statement can involve the construction and copy of a temporary object (12.2). Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior in a value-returning function. 3 A return statement with an expression of type “cv void”can be used only in functions with a return type of cv void; the expression is evaluated just before the function returns to its caller. And final thought, wrong mailing list... gcc-help would have been better. I thought bugs in GCC can be discussed here. Sorry if it is a wrong assumption. Regards, Igor
Re: Canonical type nodes, or, comptypes considered harmful
On Nov 7, 2006, at 11:47 AM, Douglas Gregor wrote: I just read Nathan's discussion [1] on changing GCC's type system to use canonical type nodes, where the comparison between two types requires only a pointer comparison. Right now, we use "comptypes", which typically needs to do deep structural checks to determine if two types are equivalent, because we often clone _TYPE nodes. One difficulty is that compatibility of types in C is not transitive, especially when you're compiling more than one translation unit at a time. See the thread "IMA vs tree-ssa" in Feb-Mar 2004. Geoff Keating and Joseph Myers give good examples.
Re: Canonical type nodes, or, comptypes considered harmful
On Nov 7, 2006, at 3:05 PM, Gabriel Dos Reis wrote: "Joseph S. Myers" <[EMAIL PROTECTED]> writes: [...] | In C, there are several other sources of intransitivity, such as | incomplete array types (int[10] and int[5] both compatible with int[] but I'm sure we can (and probably must) specical-case that. | not with each other), unprototyped functions and enums (two enums | compatible with the same integer type are not compatible with each other). well, we can just have the unified type nodes for the C++ front-end if the C system is determined to be resilient :-) I do understand the advantages of sharing them more. Perhaps some 90% solution could be made to work, with most type nodes being unified and the problem cases (there would not be any in C++, apparently) using the existing inefficient mechanisms.
Re: Canonical type nodes, or, comptypes considered harmful
On Nov 9, 2006, at 4:54 PM, Mike Stump wrote: On Nov 8, 2006, at 7:14 AM, Ian Lance Taylor wrote: The way to canonicalize them is to have all equivalent types point to a single canonical type for the equivalence set. The comparison is one memory dereference and one pointer comparison, not the current procedure of checking for structural equivalence. Once not equal addresses might mean equal types, you have to do a structure walk to compare types, and you're right back were we started. The only way to save yourself, is to be able to say, different addresses, _must_ be different types. I had in mind something like if (p1 == p2) equal else if (p1->ptr_equality_suffices_for_this_type || p2- >ptr_equality_suffices_for_this_type) not equal else tree walk Don't know how workable that is.
Re: strict aliasing question
On Nov 11, 2006, at 10:45 PM, Howard Chu wrote: Andrew Pinski wrote: On Sat, 2006-11-11 at 22:18 -0800, Ian Lance Taylor wrote: Your code will be safe on all counts if you change buf from int[] to char[]. The language standard grants a special exemption to char* pointers. Without that exemption, it would be impossible to write malloc in C. As I recall, we chose int[] for alignment reasons, figuring we'd have no guarantees on the alignment of a char[]. True, but add __attribute__((aligned(4))) and all is well.
Re: 32bit Calling conventions on linux/ppc.
On Dec 12, 2006, at 11:42 AM, David Edelsohn wrote: Joslwah writes: Joslwah> Looking at the Linux 32bit PowerPC ABI spec, it appears to me that Joslwah> floats in excess of those that are passed in registers are supposed to Joslwah> be promoted to doubles and passed on the stack. Examing the resulting Joslwah> stack from a gcc generated C call it appears they are passed as Joslwah> floats. Joslwah> Can someone confirm/refute this, or else point me to an ABI that says Joslwah> that they should be passed as floats. I have not been able to find any motivation for promoting floats passed ont the stack. Does this provide some form of compatibility with SPARC? It may have been intended to allow the callee to be a K&R-style or varargs function, where all float args get promoted to double. In particular, printf was often called without being declared in K&R- era code. This is one way to make that code work in a C90 environment.
Re: 32bit Calling conventions on linux/ppc.
On Dec 12, 2006, at 12:07 PM, David Edelsohn wrote: Dale Johannesen writes: Dale> It may have been intended to allow the callee to be a K&R- style or Dale> varargs function, where all float args get promoted to double. Dale> In particular, printf was often called without being declared in K&R- Dale> era code. This is one way to make that code work in a C90 environment. Except that arguments in registers are not promoted and arguments in registers spilled to the stack for varargs are not promoted. In fact it makes varargs more complicated. And it does not really match K&R promotion rules. On ppc, floating point regs always contain values in double format, so passing a single value and reading it as double Just Works. To clarify, I am not defending this, just offering a possible explanation. If I'm right, the whole issue is obsolete and there is currently no good reason to do the promotion.
Re: REG_ALLOC_ORDER and Altivec registers
On Mar 1, 2007, at 12:57 AM, Tehila Meyzels wrote: Revital Eres wrote on 01/03/2007 10:37:36: Hello, I wonder why this order (non-consecutive, decreasing) of Altivec registers was chosen when specifying the allocation order in REG_ALLOC_ORDER. (taken from rs6000.h) /* AltiVec registers. */\ 77, 78, \ 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, \ 79, \ 96, 95, 94, 93, 92, 91, \ 108, 107, 106, 105, 104, 103, 102, 101, 100, 99, 98, 97, \ 109, 110,\ 111, 112, 113\ I think part of the answer can be found here: http://gcc.gnu.org/ml/gcc/2003-06/msg00902.html "We have found that re-arranging the REG_ALLOC_ORDER in rs6000.h so that all the FP registers come after the integer registers greatly reduces the tendency of the compiler to generate code that moves 8-byte quantites through the FP registers." I don't think so, the ordering above is in the original Altivec patch here http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00453.html which precedes that discussion by over a year. Obviously you want to use caller-saved registers before callee-saved ones. The consecutive reverse ordering of the callee-saved registers matches the ordering in the save/restore routines in the Altivec PIM (also found in darwin-vecsave.asm), which is desirable for that mechanism to work well. (The common ordering doesn't logically have to be reversed; I'd guess that was chosen to be analogous to the integer stmw/lmw instructions.) The ordering of the caller-saved regs looks odd to me. V13..V2 should be used in that order to minimize conflict with parameters and return values. V0 and V1 are preferred to those, and I'd expect V14..V19 to be preferred also, but they aren't. Perhaps to minimize the code that sets up VRsave?
Re: GCC -On optimization passes: flag and doc issues
On Apr 17, 2007, at 4:20 PM, Eric Christopher wrote: increase code size? I feel I must be missing something really obvious... is it just that the other optimisations that become possible on inline code usually compensate? That or the savings from not having to save/restore registers, set up the frame, etc as well. Don't forget the call and its setup. Trivially, inlining an empty function is always a size win. There actually were a couple in Spec95.
Re: Extension compatibility policy
On Feb 27, 2005, at 12:56 PM, Mike Hearn wrote: Are these compatibility patches available in discrete diff form anywhere? No. The branch's name is apple-ppc-branch, and changes are marked as APPLE LOCAL.
Re: Hot and Cold Partitioning (Was: GCC 4.1 Projects)
On Feb 28, 2005, at 4:43 AM, Joern RENNECKE wrote: Dale Johannesen wrote: Well, no, what is supposed to happen (I haven't tried it for a while, so I don't promise this still works) is code like this: .hotsection: loop: conditional branch (i?==1000) to L2 L1: /* do stuff */ end loop: /* still in hot section */ L2: jmp L3 .coldsection: L3: i = 0; jmp L1 Well, even then, using of the cold section can increase the hot section size, depending on target, and for some targets the maximum supported distance of the cold section. Certainly. In general it will make the total size bigger, as does inlining. If you have good information about what's hot and cold, it should reduce the number of pages that actually get swapped in. The information has to be good, though, as a branch from hot<->cold section becomes more expensive. I'd recommend it only if you have profiling data (this is a known winner on Spec in that situation). Should I do custom basic block reordering in machine_dependent_reorg to clean up the turds of hot and cold partitioning? No, you should not turn on partitioning in situations where code size is important to you.
Re: Hot and Cold Partitioning (Was: GCC 4.1 Projects)
On Feb 28, 2005, at 10:19 AM, Joern RENNECKE wrote: Dale Johannesen wrote: Certainly. In general it will make the total size bigger, as does inlining. If you have good information about what's hot and cold, it should reduce the number of pages that actually get swapped in. The information has to be good, though, as a branch from hot<->cold section becomes more expensive. I'd recommend it only if you have profiling data (this is a known winner on Spec in that situation). Should I do custom basic block reordering in machine_dependent_reorg to clean up the turds of hot and cold partitioning? No, you should not turn on partitioning in situations where code size is important to you. You are missing the point. In my example, with perfect profiling data, you still end up with more code in the hot section, Yes. i.e. more pages are actually swapped in. Unless the cross-section branch is actually executed, there's no reason the unconditional jumps should get paged in, so this doesn't follow. A block should not be put in the code section unless it is larger than a jump into the cold section. Worth trying, certainly. My guess is it won't matter much either way.
Re: Merging calls to `abort'
On Mar 14, 2005, at 10:30 AM, Joe Buck wrote: Steven Bosscher <[EMAIL PROTECTED]> wrote: system.h:#define abort() fancy_abort (__FILE__, __LINE__, __FUNCTION__) I agree that this is the best technical solution, even if cross-jumping were not an issue. This invokes undefined behavior in a program that includes , which some would consider a good reason not to prefer it. I believe the cross-jumping should definitely be done with -Os; the optimization makes a useful contribution to reducing code size, which the user has told us is important to him. Other than that, I don't care much. (I have debugged problems where the debugger was showing me the wrong abort call, and this was annoying, but not something I couldn't deal with. Typically you just have to stop on the right call to the function that's calling abort.)
RFC: always-inline vs unit-at-a-time
Consider the following: static inline int a() __attribute((always_inline)); static inline int b() __attribute((always_inline)); static inline int b() { a(); } static inline int a() { } int c() { b(); } This compiles fine at -O2. At -O0 we get the baffling error sorry, unimplemented: inlining failed in call to 'a': function not considered for inlining It seems undesirable for -O options to affect which programs will compile. The obvious thing to do about it is turn on -funit-at-a-time always, but I'm concerned about the effect on compile speed; has anybody measured it?
Re: RFC: always-inline vs unit-at-a-time
On Mar 15, 2005, at 10:32 AM, Zack Weinberg wrote: Dale Johannesen <[EMAIL PROTECTED]> writes: Consider the following: static inline int a() __attribute((always_inline)); static inline int b() __attribute((always_inline)); static inline int b() { a(); } static inline int a() { } int c() { b(); } This compiles fine at -O2. At -O0 we get the baffling error sorry, unimplemented: inlining failed in call to 'a': function not considered for inlining It seems undesirable for -O options to affect which programs will compile. Agreed. Perhaps we should run the inliner at -O0 if we see always_inline attributes, just for those functions? We do; the problem is that it makes only 1 pass, so tries to inline "a" before it has seen the body of "a". If you interchange the definitions of "a" and "b" the inlining is done at all optimization levels. I think this could be done without turning on -funit-at-a-time, even (the inliner does work in -O2 -fno-unit-at-a-time mode, after all). That gets the same failure on this example. The problem is not the effect on compile speed (IIRC Honza had it down to negligible) but the way it breaks assembly hacks such as crtstuff.c. (I would love to see a solution to that.) I wasn't aware of this problem, can you give me a pointer?
Re: Do we still need get_callee_fndecl?
On Mar 22, 2005, at 8:14 AM, Kazu Hirata wrote: After all, all we need in get_callee_fndecl seems to be addr = TREE_OPERAND (call_expr, 0); return ((TREE_CODE (addr) == ADDR_EXPR && TREE_CODE (TREE_OPERAND (addr, 0)) == FUNCTION_DECL) ? TREE_OPERAND (addr, 0) : NULL_TREE; Thoughts? In Objective C (and ObjC++) it's also a good idea to look under OBJ_TYPE_REF. See this patch, which was deferred to 4.1 and I'm going to resubmit RSN: http://gcc.gnu.org/ml/gcc-patches/2004-12/txt00122.txt
Re: Do we still need get_callee_fndecl?
On Mar 22, 2005, at 10:21 AM, Kazu Hirata wrote: Hi Dale, After all, all we need in get_callee_fndecl seems to be addr = TREE_OPERAND (call_expr, 0); return ((TREE_CODE (addr) == ADDR_EXPR && TREE_CODE (TREE_OPERAND (addr, 0)) == FUNCTION_DECL) ? TREE_OPERAND (addr, 0) : NULL_TREE; Thoughts? In Objective C (and ObjC++) it's also a good idea to look under OBJ_TYPE_REF. See this patch, which was deferred to 4.1 and I'm going to resubmit RSN: http://gcc.gnu.org/ml/gcc-patches/2004-12/txt00122.txt Thanks for the information. Does OBJ_TYPE_REF_EXPR only apply to a CALL_EXPR? In other words, are there other forms of constants that are exposed by looking into OBJ_TYPE_REF_EXPR? I believe the usage here is the only one relevant to ObjC. It is used for other things in C++, but I don't know how.
RFA: PR 19225
I'm interested in fixing this, but could use some help from somebody knowledgeable about how x86 EH is supposed to work. In particular, what's the expected relationship between SP at the point of a throwing call, and when it gets back to the landing pad?
Re: RFA: PR 19225
On Mar 24, 2005, at 12:35 PM, James E Wilson wrote: Dale Johannesen wrote: I'm interested in fixing this, but could use some help from somebody knowledgeable about how x86 EH is supposed to work. In particular, what's the expected relationship between SP at the point of a throwing call, and when it gets back to the landing pad? There is no direct relationship between the two SP values. If they are different, then there should be unwind info indicating the difference, and the unwinder should be applying those differences while unwinding. There is a statement to this effect in comment #3 from Andrew. Actually I wrote that comment. While I see that it could be done that way in the unwinder, I found no code that was actually trying to do it. So I was unclear about the intent. However, looking at this, I am tempted to call it a bug in the defer pop optimization. ...It is probably much easier to fix the defer pop optimization than to fix the unwinder to handle this. I had tentatively reached this conclusion also, more slowly I'm sure. Actually, looking at this, I am surprised how may NO_DEFER_POP calls we have without corresponding OK_DEFER_POP calls. I wonder if this optimization is already broken, in the sense that is it being accidentally disabled when it shouldn't be. Or maybe the code is just more obtuse than it needs to be. No, I think you are right, I'll see if I can clean things up without breaking it. Thanks for your comments.
Re: GCC 4.0 Status Report (2005-03-24)
On Mar 24, 2005, at 3:08 PM, James E Wilson wrote: Richard Henderson wrote: 19255 EH bug on IA32 when using heavy optimization Typo in pr number? I think that is supposed to be 19225, for which I have already suggested a solution though not a patch (disable deferred argument popping when a call can throw). It isn't marked critical though, so I don't know why it is on the list, unless perhaps Mark just changed the status to be not critical. I'm testing a fix for this. Will assign to myself.
Re: bootstrap fails for apple-ppc-darwin
On Mar 31, 2005, at 12:18 PM, Mike Stump wrote: On Mar 31, 2005, at 10:54 AM, Fariborz Jahanian wrote: Today, I tried bootstrapping gcc mainline on/for apple-ppc-darwin. It fails in stage1. I can see the problem also... :-( I doubt if the person that broke it knows about it. It was working just a short time ago (beginning of the week?). My March 26 checkout works fine.
Re: bootstrap fails for apple-ppc-darwin
On Mar 31, 2005, at 12:23 PM, Dale Johannesen wrote: On Mar 31, 2005, at 12:18 PM, Mike Stump wrote: On Mar 31, 2005, at 10:54 AM, Fariborz Jahanian wrote: Today, I tried bootstrapping gcc mainline on/for apple-ppc-darwin. It fails in stage1. I can see the problem also... :-( I doubt if the person that broke it knows about it. It was working just a short time ago (beginning of the week?). My March 26 checkout works fine. ...but it occurs to me I didn't install 8A428 until the 28th or 29th and haven't rebuilt since, so if it's a recently introduced cctools problem, I might not be seeing it. Anybody seeing this on an old cctools?
RFC: #pragma optimization_level
I've currently got the job of implementing pragma(s) to change optimization level in the middle of a file. This has come up a few times before, http://gcc.gnu.org/ml/gcc/2001-06/msg01275.html http://gcc.gnu.org/ml/gcc/2002-09/msg01171.html http://gcc.gnu.org/ml/gcc/2003-01/msg00557.html and so far nothing has been done, but the users who want this feature have not gone away, so I will be doing it now. The only real opposition to the idea was from Mark Mitchell, in the earliest of these threads, http://gcc.gnu.org/ml/gcc/2001-06/msg01395.html So I guess question 1 is, Mark, do you feel negatively enough about this feature to block its acceptance in mainline? If so, I'll go do this as a local patch, Geoff will complain a lot, and it will be done 4 times as fast :) Let's assume for the sake of argument that Mark is OK with it. Mark's message also raises some good questions about semantics. My answers are: - Flags that logically refer to a whole file at once cannot be changed. In this category I know of -funit-at-a-time and -fmerge-constants; there may be others I haven't found. - When function A is inlined into B, the inlined copy is now part of B, and whatever flags were in effect at the beginning of B apply to it. (The decision whether to inline is also based on the flags in effect at the beginning of B.) - As a first cut I intend to allow only -O[0123s] to be specified in the pragma, as suggested by Geert Bosch. I don't think there's any reason this couldn't be extended to single flags. Implementation: the general idea is - at the beginning of parsing each function, record the flags currently in effect in (or pointed to from) the FUNC_DECL node. - before optimizing/generating code for each function, reset the flags from stored values. As a first step I think I'll unify the various flags into a struct; that seems like a good janitor patch anyway. (I should add that a requirement for me is CodeWarrior compatibility. Their syntax is #pragma optimization_level [01234] #pragma optimize_for_size Functionality doesn't have to match exactly, but ought to be more or less the same. CW's treatment of the interaction with inlining is as described above, and I'd be averse to changing that; their way is reasonable, and there's existing code that depends on it. For mainline I assume we'll need "GCC" to the syntax; that local change is small compared to making it work though.) Comments?
Re: RFC: #pragma optimization_level
On Apr 1, 2005, at 11:24 AM, Mark Mitchell wrote: Dale Johannesen wrote: So I guess question 1 is, Mark, do you feel negatively enough about this feature to block its acceptance in mainline? I'm not sure that I *could* block it, but, no, I don't feel that negatively. Well, in theory nobody can block anything (although some people's posts suggest they don't understand this). In practice if you or another GWM objects to something, nobody else is going to override you and approve it. I tried to address your other questions in my previous message, but: I think that a #pragma (or attribute) that affects only optimization options is less problematic than generic option processing (.e.g, flag_writable_strings, as in the email you reference). I do think that you need to clearly document how inlining plays with this. In particular, if you inline a -O2 function into a -O0 function, what happens? (My suggestion would be that the caller's optimization pragmas win.) Agree. (And documentation will be written.) Also, you should document what set of optimization options can be specified. I think it should only be ones that do not change the ABI; things like -O2, or turning off particular passes are OK, while options that change calling conventions, etc., should be disallowed. Agree. Also, you need to say what happens in the situation where the user has done "-O2 -fno-gcse" and the #pragma now says "-O2". Does that reenable GCSE? (I think it should.) Yes. what's the granularity of this #pragma? Function-level, I hope? That's what I assumed. Anything finer than that is insane. :-) Actually there are cases where it makes sense: you could ask that a particular call be inlined, or a particular loop be unrolled N times. However, I'm not planning to do anything finer-grained than a function at the moment. Certainly for optimizations that treat a whole function at once, which is most of them, it doesn't make sense.
Re: RFC: #pragma optimization_level
On Apr 3, 2005, at 5:31 PM, Geert Bosch wrote: On Apr 1, 2005, at 16:36, Mark Mitchell wrote: In fact, I've long said that GCC had too many knobs. (For example, I just had a discussion with a customer where I explained that the various optimization passes, while theoretically orthogonal, are not entirely orthogonal in practice, and that truning on another pass (GCSE, in this caes) avoided other bugs. For that reason, I'm not actually convinced that all the -f options for turning on and off passes are useful for end-users, although they are clearly useful for debugging the compiler itself. I think we might have more satisfied users if we simply had -Os, -O0, ..., -O3. However, many people in the GCC community itself, and in certain other vocal areas of the user base, do not agree.) Pragmas have even more potential for causing problems than command-line options. People are generally persuaded more easily to change optimization options, than to go through hundreds of source files fixing pragmas. I would hope so. But the reason I'm doing this is that we've got a lot of customer requests for pragma-level control of optimization. As the average life of a piece of source code is far longer than the life-span of a specific GCC release, users expect to compile unchanged source code with many different compilers. For this reason, I think it is big mistake to allow pragmas to turn on or off individual passes. The internal structure of the compiler changes all the time, and pragmas written for one version may not make sense for another version. The effect will be that over time, user pragmas are wrong more often than right, and the compiler will often do better when just ignoring them all together. (This is when people will ask for a -fignore-source-optimization-pragmas flag.) Pressure on GCC developers to maintain compatibility with old flags will increase as well. This is a recipe for disaster. Certainly problems can arise, but I think you're seriously overstating them. The C and C++ standards require that unrecognized pragmas be ignored, and the pragmas we're talking about don't affect correctness. So the worst effect you should see is that your code is less efficient than expected. (The changes to disallow nonconforming code which go in with every release, some of which change behavior that's been stable for years, are a much bigger problem for users.) But doing anything much more elaborate than optimization (off, size, some, all, inlining) corresponding to (-O0, Os, O1, O2, O3) on a per-function basis seems a bad idea. Personally I have no strong opinion about this either way.
Re: bootstrap compare failure in ada/targparm.o on i686-pc-linux-gnu?
On Apr 4, 2005, at 2:32 PM, Alexandre Oliva wrote: On Mar 26, 2005, Graham Stott <[EMAIL PROTECTED]> wrote: I do regular bootstraps of mainline all languages on FC3 i686-pc-linuux-gnu and haven't seen any problemss upto Friday. I'm using --enable-checking=tree,misc,rtl,rtlflag which might make a difference. I'm still observing this problem every now and then. It's not consistent or easily reproducible, unfortunately. I suspect we're using pointers somewhere, and that stack/mmap/whatever address randomization is causing different results. I'm looking into it. I've found 2 bugs over the last 6 months where the problem is exposed only if two pointers happen to hash to the same bucket. It's occurred to me that doing a bootstrap with all hashtable sizes set to 1 might be a good idea.
Re: bootstrap compare failure in ada/targparm.o on i686-pc-linux-gnu?
On Apr 4, 2005, at 3:21 PM, Alexandre Oliva wrote: On Apr 4, 2005, Dale Johannesen <[EMAIL PROTECTED]> wrote: On Apr 4, 2005, at 2:32 PM, Alexandre Oliva wrote: On Mar 26, 2005, Graham Stott <[EMAIL PROTECTED]> wrote: I do regular bootstraps of mainline all languages on FC3 i686-pc-linuux-gnu and haven't seen any problemss upto Friday. I'm using --enable-checking=tree,misc,rtl,rtlflag which might make a difference. I'm still observing this problem every now and then. It's not consistent or easily reproducible, unfortunately. I suspect we're using pointers somewhere, and that stack/mmap/whatever address randomization is causing different results. I'm looking into it. I've found 2 bugs over the last 6 months where the problem is exposed only if two pointers happen to hash to the same bucket. It's occurred to me that doing a bootstrap with all hashtable sizes set to 1 might be a good idea. Perhaps. But the fundamental problem is that we shouldn't be hashing on pointers, and tree-eh.c does just that for finally_tree and throw_stmt_table. Hmm. Of the earlier bugs, in http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01760.html the hash table in question is built by DOM, and in http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01810.html it's built by PRE (VN). I don't think there's general agreement that "we shouldn't be hashing on pointers"
Re: ERROR : pls help
On Apr 7, 2005, at 11:58 AM, Virender Kashyap wrote: hi i made some changes in gcc code. when i try to comile it using make , i get the following error ( last few lines from output ). Pls help me in removing this error. The command line you show is the built compiler trying to build gcc's library. It doesn't work, which means there is a bug in your changes.
Re: Q: C++ FE emitting assignments to global read-only symbols?
On Apr 8, 2005, at 4:40 PM, Mark Mitchell wrote: Daniel Berlin wrote: Your transform is correct. The FE is not. The variable is not read only. It is write once, then read-only. Diego, your analysis is exactly correct about what is happenning. I agree, in principle. The C++ FE should not set TREE_READONLY on variables that require dynanmic initialization. Until now, that's not been a problem, and it does result in better code. But, it's now becoming a problem, and we have others way to get good code coming down the pipe. I do think the C++ FE needs fixing before Diego's change gets merged, though. I can make the change, but not instantly. If someone files a PR, and assigns to me, I'll get to it at some not-too-distant point. It would be good to have a way to mark things as "write once, then readonly" IMO. It's very common, and you can do some of the same optimizations on such things that you can do on true Readonly objects.
Re: unreducable cp_tree_equal ICE in gcc-4.0.0-20050410
On Apr 14, 2005, at 7:14 AM, Andrew Pinski wrote: Does this bug look familiar? 20629 is ICEing in the same spot, but it looks like theirs was reproducible after preprocessing. Is there any more information that I provide that would be helpful? I've attached the command line, specs and a stacktrace from cc1plus. I think this was fixed on the mainline by: 2005-03-18 Dale Johannese <[EMAIL PROTECTED]> * cp/tree.c (cp_tree_equal): Handle SSA_NAME. Yep, and I didn't put it in the release branch. Bad Dale. OK to do that? If this is the same problem, changing the VN hashtable size to 1 should make it show up reproducibly.
Re: struct __attribute((packed));
On Apr 15, 2005, at 8:27 AM, E. Weddington wrote: Ralf Corsepius wrote: Hi, I just tripped over this snipped below in a piece of code, I didn't write and which I don't understand: ... struct somestruct { struct entrystruct *e1 __attribute__ ((packed)); struct entrystruct *e2 __attribute__ ((packed)); }; ... Is this meaningful? I guess the author wanted e1 and e2 to point to a "packed struct entrystruct", but this doesn't seem to be what GCC interprets this code. There is no reason a definition of "struct entrystruct" should even be visible at this point, so that doesn't seem like a very reasonable interpretation.
Re: how small can gcc get?
On Apr 24, 2005, at 6:43 AM, Mike Stump wrote: On Saturday, April 23, 2005, at 05:05 PM, Philip George wrote: What's the smallest size I can squeeze gcc down to and how would I go about compiling it in such a way? My take: #define optimize 0 "optimize" is a variable and "int 0" won't parse, so that won't come close. What did you really mean? Turning off optimization is not going to get you the smallest code size, since many optimizations reduce it...the option intended to produce smallest code is -Os. Configuring with --disable-checking is also important. and then rebuild with dead code stripping. :-) You'd be the first to do this that I know of, so, won't necessarily be easy, but, might be a bit smaller than you'd get otherwise.
Re: volatile semantics
On May 3, 2005, at 7:41 AM, Nathan Sidwell wrote: Mike Stump wrote: int avail; int main() { while (*(volatile int *)&avail == 0) continue; return 0; } Ok, so, the question is, should gcc produce code that infinitely loops, or should it be obligated to actually fetch from memory? Hint, 3.3 fetched. I beleive the compiler is so licensed. [5.1.2.3/2] talks about accessing a volatile object. If the compiled can determine the actual object being accessed through a series of pointer and volatile cast conversions, then I see nothing in the std saying it must behave as-if the object were volatile when it is not. This is correct; the standard consistently talks about the type of the object, not the type of the lvalue, when describing volatile. However, as a QOI issue, I believe the compiler should treat the reference as volatile if either the object or the lvalue is volatile. That is obviously the user's intent.
Re: volatile semantics
On May 3, 2005, at 11:03 AM, Nathan Sidwell wrote: Dale Johannesen wrote: However, as a QOI issue, I believe the compiler should treat the reference as volatile if either the object or the lvalue is volatile. That is obviously the user's intent. I'm not disagreeing with you, but I wonder at gcc's ability to make good on such a promise. A cast introducing a volatile qualifier will be a NOP_EXPR, and gcc tends to strip those at every opportunity. You may well be right, I haven't tried to implement it (and am not planning to). Also, I wonder about the following example int const avail = int main() { while (*(int *)&avail == Foo ()) do_something(); return 0; } Seeing through the const-stripping cast is a useful optimization. It is? Why would somebody write that? A further pathelogical case would be, int main() { while (*(int *)(volatile int *)&avail) do_something (); return 0; } What should this do, treat the volatile qualifier as sticky? IMO, no, but surely we don't have to worry about this one. Either way is standard conformant and the user's intent is far from clear, so whatever we do should be OK.
Re: volatile semantics
On May 3, 2005, at 11:21 AM, Paul Koning wrote: This change bothers me a lot. It seems likely that this will break existing code possibly in subtle ways. It did, that is why Mike is asking about it. :)
Re: volatile semantics
On May 3, 2005, at 11:52 AM, Nathan Sidwell wrote: Dale Johannesen wrote: On May 3, 2005, at 11:03 AM, Nathan Sidwell wrote: Seeing through the const-stripping cast is a useful optimization. It is? Why would somebody write that? perhaps a function, which returned a non-const reference that happened to be bound to a constant, has been inlined. OK, I agree. IMO, no, but surely we don't have to worry about this one. Either way is standard conformant and the user's intent is far from clear, so whatever we do should be OK. If we guarantee one to work and not the other, we need to have a clear specification of how they differ. What if intermediate variables -- either explicit in the program, or implicitly during the optimization -- get introduced? My guess is that the wording of the standard might be the best that could be achieved in this regard. It would be nice to have some clear wording indicating that Mike's example will work, but some other, possibly closely related, example will not. It's not that bad; the type of an lvalue is already well defined (it is "int" in your last example, and "volatile int" in Mike's). We just take this type into account in determining whether a reference is to be treated as volatile. (Which means we need to keep track of, or at least be able to find, both the type of the lvalue and the type of the underlying object. As you say, gcc may have some implementation issues with this.) And we don't have to document the behavior at all; it is not documented now.
Re: volatile semantics
On May 4, 2005, at 5:06 AM, Gabriel Dos Reis wrote: Andrew Haley <[EMAIL PROTECTED]> writes: | Nathan Sidwell writes: | > Dale Johannesen wrote: | > | > > And we don't have to document the behavior at all; it is not documented | > > now. | > I disagree. It's not documented explicitly in gcc now, because it is doing | > what the std permits, and so documented there. We should document either | > | > a) that current gcc is not breaking the std, and Mike's example is invalid | > code, if one expects a volatile read. This would be a FAQ like thing. Both behaviors are standard-compliant. Treating a reference as volatile when you don't have to just means strictly following the rules of the abstract machine; it can never break anything. I vote for (a). [...] | This is a bad extension to gcc and will cause much trouble, just like | the old guarantee to preserve empty loops. I see a difference between a documented extension, and quietly choosing from among standard-compliant behaviors the one which is most convenient for users.
Re: volatile semantics
On May 5, 2005, at 5:23 AM, Kai Henningsen wrote: [EMAIL PROTECTED] (Nathan Sidwell) wrote on 03.05.05 in <[EMAIL PROTECTED]>: Mike Stump wrote: int avail; int main() { while (*(volatile int *)&avail == 0) continue; return 0; } Ok, so, the question is, should gcc produce code that infinitely loops, or should it be obligated to actually fetch from memory? Hint, 3.3 fetched. I beleive the compiler is so licensed. [5.1.2.3/2] talks about accessing a volatile object. If the compiled can determine the actual object being accessed through a series of pointer and volatile cast conversions, then I see nothing in the std saying it must behave as-if the object were volatile when it is not. This, of course, might not be useful to users :) As a QOI issue, it would be nice if such a situation caused a warning ("ignoring volatile cast ..." or something like that). It's rather dangerous to have the user believe that this worked as intended when it didn't. If we aren't going to make this work as obviously intended, and the sentiment seems to be against it, then this is certainly a good idea.
Re: Proposed resolution to aliasing issue.
On May 11, 2005, at 11:42 AM, Mark Mitchell wrote: Kenny and I had a long conversation about the aliasing issue, and reached the following proposed solution. In short, the issue is, when given the following code: struct A {...}; struct B { ...; struct A a; ...; }; void f() { B b; g(&b.a); } does the compiler have to assume that "g" may access the parts of "b" outside of "a". If the compiler can see the body of "g" than it may be able to figure out that it can't access any other parts, or figure out which parts it can access, and in that case it can of course use that information. The interesting case, therefore, is when the body of "g" is not available, or is insufficient to make a conclusive determination. Our proposed approach is to -- by default -- assume that "g" may access all of "b". However, in the event that the corresponding parameter to "g" has an attribute (name TBD, possibly the same as the one that appears in Danny's recent patch), then we may assume that "g" (and its callees) do not use the pointer to obtain access to any fields of "b". For example: void g(A *p __attribute__((X))); void f() { B b; g(&b.a); /* Compiler may assume the rest of b is not accessed in "g". */ } This approach allows users to annotate code to get better optimization while still perserving the behavior of current, possibly conforming, progrBams.\ I assume the type of the field is irrelevant (although you chose a struct for your example)? I assume the attribute has both positive and negative forms? I assume the semantics have nothing to do with B per se, but apply to all possible containing structs? (Mail.app *will* screw this up, please be tolerant): struct B { ...; struct A a1; struct A a2; ... }; struct C { ...; struct A a; ... }; struct D { ... ; struct B b; }; void g(A *p __attribute__((X)), int field_addressed); void f() { B b; C c; D d; g(&b.a1, 1); /* Compiler may assume the rest of b is not accessed in "g". */ g(&b.a2, 2): /* How about now? */ g(&c.a, 3); /* What b? */ g(&d.b.a1, 4); /* cannot alter rest of d.b, how about rest of d? */ If, in future, the commitee reaches the conclusion that all functions should be treated as if they had the attribute, i.e., that you cannot perform the kinds of operations shown above in the example for "g", then we will modify the compiler so that, by default, the compiler treats all parameters as if they had this atrribute. We would then also add a switch to disable the optimization for people who have legacy code, just as we have -fno-strict-aliasing. [ I did not discuss this with Kenny, but another option is to have a -fassume-X switch, off by default, which treats your code as if you had the magic attribute everywhere. ] I'm not so sure an attribute is a good idea. That's definitely a language extension, one way or another; I'm thinking more along the lines of trying to follow the standard, with the problem being that we can't figure out what it says. The flag seems cleaner to me. Also certain optimizations are slightly easier that way, e.g. figuring out whether a field can be kept in a register when another field's address is taken just requires looking at the global flag rather than every call in the function (not a big deal). The attribute might well be unnecessary, and once it's in it's in forever. And I suspect supporting different semantics for different calls will create problems down the line, somehow or other (although I confess I don't think of any offhand).
Re: Compiling GCC with g++: a report
On May 24, 2005, at 9:43 AM, Joe Buck wrote: On Tue, May 24, 2005 at 05:03:27PM +0200, Andreas Schwab wrote: Paul Koning <[EMAIL PROTECTED]> writes: I hope that doesn't require (void *) casts for pointer arguments passed to the likes of memcpy... Only the (void*) -> (any*) direction requires a cast in C++, the other direction is still converted implicitly. For this reason, I always cast the result of malloc to the proper type; it just feels wrong otherwise. Yes, if the cast looks odd to you, you probably don't go back far enough. I've certainly used compilers that warned when you didn't have a cast there.
Re: More front end type mismatch problems
On May 27, 2005, at 11:05 AM, Diego Novillo wrote: This is happening in gcc.dg/tree-ssa/20040121-1.c. The test specifically tests that (p!=0) + (q!=0) should be computed as int: char *foo(char *p, char *q) { int x = (p !=0) + (q != 0); ... } Is this program legal C? != is defined to produce an int result in C. This is valid, and may produce a result of 0, 1, or 2.
Re: Will Apple still support GCC development?
On Jun 6, 2005, at 12:17 PM, Samuel Smythe wrote: It is well-known that Apple has been a significant provider of GCC enhancements. But it is also probably now well-known that they have opted to drop the PPC architecture in favor of an x86-based architecture. Will Apple continue to contribute to the PPC-related componentry of GCC, or will such contributions be phased out as the transition is made to the x86-based systems? In turn, will Apple be providing more x86-related contributions to GCC? Nobody from Apple has yet responded to this because Apple does not generally like its employees to make public statements about future plans. I have been authorized to say this, however: Apple will be using gcc as its development compiler for producing Mac OS X Universal Binaries which target both PowerPC and Intel architectures. We will continue to contribute patches to both efforts.
Re: Can't bootstrap mainline on powerpc64-linux
On Jun 9, 2005, at 12:43 PM, Pat Haugen wrote: cc1: warnings being treated as errors /home/pthaugen/work/src/mainline/gcc/gcc/config/rs6000/rs6000.c:12538: warning: ‘rs6000_invalid_within_doloop’ defined but not used Problem is Adrian changed TARGET_INSN_VALID_WITHIN_DOLOOP to TARGET_INVALID_WITHIN_DOLOOP most places, but not in rs6000.c. I'll commit the following as obvious after bootstrap succeeds. Index: rs6000.c === RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.c,v retrieving revision 1.838 diff -u -b -r1.838 rs6000.c --- rs6000.c9 Jun 2005 14:23:28 - 1.838 +++ rs6000.c9 Jun 2005 22:46:02 - @@ -906,8 +906,8 @@ #undef TARGET_FUNCTION_OK_FOR_SIBCALL #define TARGET_FUNCTION_OK_FOR_SIBCALL rs6000_function_ok_for_sibcall -#undef TARGET_INSN_VALID_WITHIN_DOLOOP -#define TARGET_INSN_VALID_WITHIN_DOLOOP rs6000_invalid_within_doloop +#undef TARGET_INVALID_WITHIN_DOLOOP +#define TARGET_INVALID_WITHIN_DOLOOP rs6000_invalid_within_doloop #undef TARGET_RTX_COSTS #define TARGET_RTX_COSTS rs6000_rtx_costs
Re: basic VRP min/max range overflow question
On Jun 17, 2005, at 5:59 PM, Paul Schlie wrote: From: Andrew Pinski <[EMAIL PROTECTED]> On Jun 17, 2005, at 8:20 PM, Paul Schlie wrote: ["undefined" only provides liberties within the constrains of what is specifically specified as being undefined, but none beyond that.] That is not true. Undefined means it can run "rm /" if you ever invoke the undefined code. - If the semantics of an operation are "undefined", I'd agree; but if control is returned to the program, the program's remaining specified semantics must be correspondingly obeyed, including the those which may utilize the resulting value of the "undefined" operation. - If the result value is "undefined", just the value is undefined. (Unless one advocates that any undefined result implies undefined semantics, which enables anything to occur, including the arbitrary corruption of the remaining program's otherwise well defined semantics; in which case any invocation of implementation specific behavior may then validly result in arbitrary remaining program behavior.) Which I'd hope isn't advocated. You are wrong, and this really isn't a matter of opinion. The standard defines exactly what it means by "undefined behavior": 3.4.3 1 undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements 2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
Re: Scheduler questions (related to PR17808)
On Jun 29, 2005, at 3:46 PM, Steven Bosscher wrote: I have a question about the scheduler. Forgive me if I'm totally missing the point here, this scheduling business is not my thing ;-) Consider the following snippet that I've derived from PR17808 with a few hacks in the compiler to renumber insns and dump RTL with all the dependencies before scheduling. There is a predicate register that gets set, then a few cond_exec insns, then a jump, and finally a set using some of the registers that may be set by the cond_exec insns. This is the RTL before scheduling: Notice how the conditional sets of r14 and r17 in insns 9 and 10 have been moved past insn 14, which uses these registers. Shouldn't there be true dependencies on insns 9 and 10 for insn 14? I think so. This is figured out in sched_analyze_insn in sched-deps.c, I'd suggest stepping through there.
RFA: -mfpmath=sse -fpic vs double constants
Compiling a simple function like double foo(double x) { return x+1.0; } on x86 with -O2 -march=pentium4 -mtune=prescott -mfpmath=sse -fpic, the load of 1.0 is done as cvtss2sd[EMAIL PROTECTED](%ecx), %xmm0 (this is Linux, the same happens on Darwin). This is not really a good idea, as movsd of a double-precision 1.0 is faster. The change from double to single precision is done in compress_float_constant, and there's no cost computation there; presumably the RTL optimizers are expected to change it back if that's beneficial. Without -fpic, this does happen in cse_insn. (mem/u/i:SF (symbol_ref/u:SI ("*.LC0") gets run through fold_rtx, which recognizes this as a pool constant. This causes the known equivalent CONST_DOUBLE 1.0 to be run through force_const_mem, producing (mem/u/i:DF (symbol_ref/u:SI ("*.LC1"). Which is then tried in place of the FLOAT_EXTEND, and selected as valid and cheaper. This all seems to be working as expected. With -fpic, first, fold_rtx doesn't recognize the PIC form as representing a constant, so cse_insn never tries forcing the CONST_DOUBLE into memory. Hacking around that doesn't help, because force_const_mem doesn't produce the PIC form of constant reference, even though we're in PIC mode; we get the same (mem/u/i:DF (symbol_ref/u:SI ("*.LC1"), which doesn't test as valid in PIC mode (correctly). At this point I'm wondering if this is the right place to be attacking the problem at all. Advice? Thanks.
Re: isinf
On Jul 13, 2005, at 4:29 PM, Joe Buck wrote: On Thu, Jul 14, 2005 at 08:16:07AM +0900, Hiroshi Fujishima wrote: Eric Botcazou <[EMAIL PROTECTED]> writes: The configure script which is included in rrdtool[1] checks whether the system has isinf() as below. #include int main () { float f = 0.0; isinf(f) ; return 0; } The test is clearly fragile. Assigning the return value of isinf to a variable should be sufficient for 4.0.x at -O0. Yes, I contact rrdtool maintainer. Thank you. Best to make it a global variable, to guard against dead code elimination. Volatile would be even better. It's valid to eliminate stores into globals if you can determine the value isn't used thereafter, which we can here, at least theoretically.
gcc vs Darwin memcmp
Darwin's memcmp has semantics that are an extension of the language standard: The memcmp() function returns zero if the two strings are identical, oth- erwise returns the difference between the first two differing bytes (treated as unsigned char values, so that `\200' is greater than `\0', for example). gcc's x86 inline expansion of memcmp doesn't do this, so I need to fix it. Is there interest in having this in mainline, and if so how would you like it controlled?
Re: volatile semantics
On Jul 16, 2005, at 10:34 AM, Andrew Haley wrote: 6.3.2.1: when an object is said to have a particular type, the type is specified by the lvalue used to designate the object. I don't have a standard here, but I will point out that IF this sentence is interpreted to mean the type of an object changes depending on how it is accessed. this also makes nonsense of gcc's implementation of type-based aliasing rules. *((int *)&x) = 3 would then be valid whatever the type of x.
-malign-double vs __alignof__(double)
While fighting with the x86-darwin alignment rules, I noticed that -malign-double doesn't seem to affect __alignof__(double). This seems like a bug, but the alignof doc has so many qualifications I'm not sure exactly what it's supposed to do. Is this broken? Thanks.
Re: splitting load immediates using high and lo_sum
On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote: Hi, I am working on a port for a processor that has 32 bit registers but can only load 16 bit immediates. "" "%0.h = #HI(%1)") What are the semantics of this? Low bits zeroed, or untouched? If the former, your semantics are identical to Sparc; look at that.
RFA: Darwin x86 alignment
On x86 currently the alignments of double and long long are linked: they are either 4 or 8 depending on whether -malign-double is set. This follows the documentation of -malign-double. But it's wrong for what we want the Darwin ABI to be: the default should be that double is 4 bytes and long long is 8 bytes. So I can do that, but what should -malign-double do? - Control double but not long long; add -malign-long-long (at least if somebody asks for it; probably it wouldn't be used) - Have flags work as now: -malign-double makes both 8, -mno-align-double makes both 4. Problem with that is the default is neither of these, and this doesn't fit neatly into gcc's model of two-valued flags; it's also a bit tricky to implement for the same reason. - something else? thanks.
Re: splitting load immediates using high and lo_sum
On Jul 21, 2005, at 5:04 PM, Tabony, Charles wrote: From: Dale Johannesen [mailto:[EMAIL PROTECTED] On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote: Hi, I am working on a port for a processor that has 32 bit registers but can only load 16 bit immediates. "" "%0.h = #HI(%1)") What are the semantics of this? Low bits zeroed, or untouched? If the former, your semantics are identical to Sparc; look at that. The low bits are untouched. However, I would expect the compiler to always follow setting the high bits with setting the low bits. OK, if you're willing to accept that limitation (your architecture could handle putting the LO first, which Sparc can't) then Sparc is still a good model to look at. What it does should work for you.
Re: RFA: Darwin x86 alignment
On Jul 21, 2005, at 5:00 PM, Richard Henderson wrote: On Thu, Jul 21, 2005 at 04:56:01PM -0700, Dale Johannesen wrote: - Have flags work as now: -malign-double makes both 8, -mno-align-double makes both 4. Problem with that is the default is neither of these, and this doesn't fit neatly into gcc's model of two-valued flags; it's also a bit tricky to implement for the same reason. Nah, you just remove it from target_flags, and control the two new variables from ix86_handle_option. OK. Think that's the better approach? Why do you want to make these sort of arbitrary changes to your ABI? I can't see what you win... The compiler people are not driving this. Of course, 4-byte alignment subjects you to a penalty for misaligned loads and stores, and 8-byte alignment subjects you to a size penalty for extra holes. People have been making measurements about the issue and this is what they've come up with; I don't know details. What I wrote isn't necessarily the final change, either.
Re: Minimum target alignment for a datatype
On Jul 22, 2005, at 11:07 AM, Chris Lattner wrote: Hi All, I'm trying to determine (in target-independent code) what the *minimum* target alignment of a type is. For example, on darwin, double's are normally 4-byte aligned, but are 8-byte aligned in some cases (e.g. when they are the first element of a struct). TYPE_ALIGN on a double returns 8 bytes, is there any way to find out that they may end up being aligned to a 4-byte boundary? #pragma pack or attribute((__aligned__)) can result in arbitrary misalignments for any type.
Re: RFA: Darwin x86 alignment
On Jul 23, 2005, at 6:40 AM, Tobias Schlüter wrote: I have a strong suspicion there is a reason why the two are linked, and that that reason is FORTRAN. A lot of FORTRAN code assumes EQUIVALENCE of floating-point and integer types of equal size. Such code will in all likelyhood break if those types have different alignment. For x86 this means that int/float and long long/double will have to have the same alignment. This might indeed be a problem, as the alignments not only have to be the same if they appear in an equivalence, but also in arrays or when using the TRANSFER intrinsic. Out of the types dicussed, the standard only specifies this for default INTEGERs (=int in C) and default REALs (=float in C), but users do expect this to consistently extend to bigger types, otherwise they consider the compiler buggy instead of their code. Thanks for bringing this up. It's probably true that nobody has thought about Fortran, but so far I'm not convinced it would actually be a problem. Can somebody provide an example that would break? More precisely, the standard says this: a scalar variable of a certain type occupies a certain number of "storage units". Default INTEGERs, and REALs take one storage unit, default COMPLEX and DOUBlE PRECISION (= REAL*8 = double in C) take two storage units. Finally, arrays of these types take a sequence of contiguous storage units. I know. These rules aren't affected by target alignments, though, and I would not expect the Fortran FE to be affected by alignments when doing layout. If it is, why? The compiler already has to deal with misaligned data in Fortran: INTEGER I(3) DOUBLE PRECISION A,B EQUIVALENCE (A,I(1)), (B,I(2)) not to mention the user-specified alignment extensions in C, so I wouldn't expect the optimizers to break or anything like that.
Re: [BUG] gcc-3.4.5-20050531 (i386): __FUNCTION__ as a part of the printf's format argument
On Jul 25, 2005, at 1:58 AM, Paolo Carlini wrote: Richard Guenther wrote: Btw, this list is for the development _of_ gcc, not with gcc. Use gcc-help for that. By the way, since we have to point out that *so often*, maybe there is something wrong on our part: I wonder whether changing the names of those lists would help!?!? I don't know: gcc-development, gcc-users, ... Perhaps adding something similar to the above to the description of the gcc list on the web page would help. What's there seems clear enough to me, but perhaps a bigger hammer would help other people.
Re: gcc 3.3.6 - stack corruption questions
O Jul 25, 2005, at 3:50 PM, Robert Dewar wrote: The unoptimized version completed a 401,900 transaction test with no problem. All day, I've been playing with different things, there are many bugs, most notably uninitialed vars, that show up only when you turn on optimization. Also violations of strict aliasing rules are common. -Wuninitialized -fno-strict-aliasing [after the -O] will exercise those two. Also, mixed builds with some -O0 and some -O3 files should narrow it down.
rfa (x86): 387<=>sse moves
With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code like double d = atof(foo); int i = d; callatof fstpl -8(%ebp) movsd -8(%ebp), %xmm0 cvttsd2si %xmm0, %eax (This is Linux, Darwin is similar.) I think the difficulty is that for (set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger} regclass decides SSE_REGS is a zero-cost choice for 58. Which looks wrong, as that requires a store and load from memory. In fact, memory is the cheapest overall choice for 58 (taking its use into account also), and gcc will figure that out correctly if a more reasonable assessment is given to SSE_REGS. The immediate cause is the #Y's in the constraint: "=f#Y,m ,f#Y,*r ,o ,Y*x#f,Y*x#f,Y*x#f ,m " and there's probably a simple fix, but it eludes me. Advice? Thanks.
Re: rfa (x86): 387<=>sse moves
On Jul 26, 2005, at 12:51 AM, Paolo Bonzini wrote: Dale Johannesen wrote: With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code like double d = atof(foo); int i = d; callatof fstpl -8(%ebp) movsd -8(%ebp), %xmm0 cvttsd2si %xmm0, %eax (This is Linux, Darwin is similar.) I think the difficulty is that for (set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger} Try the attached patch. It gave a 3% speedup on -mfpmath=sse for tramp3d. Richard Henderson asked for SPEC testing, then it may go in. Thanks. That's progress; the cost computation in regclass now figures out that memory is that fastest place to put R58: Register 58 costs: AD_REGS:87000 Q_REGS:87000 NON_Q_REGS:87000 INDEX_REGS:87000 LEGACY_REGS:87000 GENERAL_REGS:87000 FP_TOP_REG:49000 FP_SECOND_REG:5 FLOAT_REGS:5 SSE_REGS:5 FP_TOP_SSE_REGS:75000 FP_SECOND_SSE_REGS:75000 FLOAT_SSE_REGS:75000 FLOAT_INT_REGS:87000 INT_SSE_REGS:91000 FLOAT_INT_SSE_REGS:91000 ALL_REGS:91000 MEM:4 Unfortunately local-alloc insists on putting in a register anyway (ST(0) instead of an XMM, but the end codegen is unchanged): ;; Register 58 in 8. I think the RA may be missing the concept that memory might be faster than any possible register will dig further.
Re: rfa (x86): 387<=>sse moves
On Jul 26, 2005, at 3:34 PM, Dale Johannesen wrote: I think the RA may be missing the concept that memory might be faster than any possible register will dig further. Yes, it is. The following fixes my problem, and causes a couple of 3DNow-specific regressions in the testsuite which I need to look at, but nothing serious; I think it's gotten far enough to post for opinions. This is intended to go on top of Paolo's patch http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01044.html It may, of course, run afoul of inaccuracies in the patterns on various targets, haven't tried any performance testing yet. Index: regclass.c === RCS file: /cvs/gcc/gcc/gcc/regclass.c,v retrieving revision 1.206 diff -u -b -r1.206 regclass.c --- regclass.c 25 Jun 2005 02:00:52 - 1.206 +++ regclass.c 27 Jul 2005 06:04:40 - @@ -838,7 +838,8 @@ /* Structure used to record preferences of given pseudo. */ struct reg_pref { - /* (enum reg_class) prefclass is the preferred class. */ + /* (enum reg_class) prefclass is the preferred class. May be + NO_REGS if no class is better than memory. */ char prefclass; /* altclass is a register class that we should use for allocating @@ -1321,6 +1322,10 @@ best = reg_class_subunion[(int) best][class]; } + /* If no register class is better than memory, use memory. */ + if (p->mem_cost < best_cost) + best = NO_REGS; + /* Record the alternate register class; i.e., a class for which every register in it is better than using memory. If adding a class would make a smaller class (i.e., no union of just those @@ -1528,7 +1533,7 @@ to what we would add if this register were not in the appropriate class. */ - if (reg_pref) + if (reg_pref && reg_pref[REGNO (op)].prefclass != NO_REGS) alt_cost += (may_move_in_cost[mode] [(unsigned char) reg_pref[REGNO (op)].prefclass] @@ -1754,7 +1759,7 @@ to what we would add if this register were not in the appropriate class. */ - if (reg_pref) + if (reg_pref && reg_pref[REGNO (op)].prefclass != NO_REGS) alt_cost += (may_move_in_cost[mode] [(unsigned char) reg_pref[REGNO (op)].prefclass] @@ -1840,7 +1845,8 @@ int class; unsigned int nr; - if (regno >= FIRST_PSEUDO_REGISTER && reg_pref != 0) + if (regno >= FIRST_PSEUDO_REGISTER && reg_pref != 0 + && reg_pref[regno].prefclass != NO_REGS) { enum reg_class pref = reg_pref[regno].prefclass;
Re: rfa (x86): 387<=>sse moves
On Jul 27, 2005, at 2:18 PM, Richard Henderson wrote: On Tue, Jul 26, 2005 at 11:10:56PM -0700, Dale Johannesen wrote: Yes, it is. The following fixes my problem, and causes a couple of 3DNow-specific regressions in the testsuite which I need to look at, but nothing serious; I think it's gotten far enough to post for opinions. This is intended to go on top of Paolo's patch http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01044.html It may, of course, run afoul of inaccuracies in the patterns on various targets, haven't tried any performance testing yet. Looks plausible. Let us know what you wind up with wrt those regressions and testing. With the latest version of Paolo's patch (in PR 19653) the regressions are gone. Spec is going to take a bit longer, I haven't gotten GMP to build yet on x86 Darwinsince the FP benchmarks are the interesting ones for this I should work through it.
Re: rfa (x86): 387<=>sse moves
On Jul 27, 2005, at 2:18 PM, Richard Henderson wrote: On Tue, Jul 26, 2005 at 11:10:56PM -0700, Dale Johannesen wrote: Yes, it is. The following fixes my problem, and causes a couple of 3DNow-specific regressions in the testsuite which I need to look at, but nothing serious; I think it's gotten far enough to post for opinions. This is intended to go on top of Paolo's patch http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01044.html It may, of course, run afoul of inaccuracies in the patterns on various targets, haven't tried any performance testing yet. Looks plausible. Let us know what you wind up with wrt those regressions and testing. OK, I've tested this on darwin x86 (both patches together). No regressions. I don't think I ought to publish absolute Spec numbers for this machine, but I get +1% on FP and +1/2% on Int. Wins: applu +3%, lucas +10%, eon +3%. Losses: apsi -9%. All other changes under 2%. This looks OK to me, though I'll be investigating apsi. (Paolo and Richard Guenther are doing this for Linux.)
Re: rfa (x86): 387<=>sse moves
On Jul 31, 2005, at 9:51 AM, Uros Bizjak wrote: Hello! With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code like double d = atof(foo); int i = d; callatof fstpl -8(%ebp) movsd -8(%ebp), %xmm0 cvttsd2si %xmm0, %eax (This is Linux, Darwin is similar.) I think the difficulty is that for This problem is similar to the problem, described in PR target/19398. There is another testcase and a small analysis in the PR that might help with this problem. Thanks, that does seem relevant. The patches so far don't fix this case; I've commented the PR explaining why.
Re: [RFC] - Regression exposed by recent change to compress_float_constant
On Aug 10, 2005, at 12:43 PM, Fariborz Jahanian wrote: Following patch has exposed an optimization shortcoming: 2005-07-12 Dale Johannesen <[EMAIL PROTECTED]> * expr.c (compress_float_constant): Add cost check. * config/rs6000.c (rs6000_rtx_cost): Adjust FLOAT_EXTEND cost. This patch results in generating worse code for the following test case: 1) Test case: struct S { float d1, d2, d3; I believe you mean double not float; the RTL snippets you give indicate this. (insn 12 7 13 0 (set (reg:SF 59) (mem/u/i:SF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S4 A32])) -1 (nil) (nil)) (insn 13 12 14 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1929 ]) [0 .d1+0 S8 A32]) (float_extend:DF (reg:SF 59))) -1 (nil) (nil)) However, if you try your example with float as given, you see it does not do a direct store of constant 0 with or without the compress_float patch. IMO the compress_float patch does not really have anything to do with this problem; before this patch the double case was working well by accident, my patch exposed a problem farther downstream, which was always there for the float case. When I put that patch in, rth remarked: While I certainly wouldn't expect fold_rtx to find out about this all by itself, I'd have thought that there would have been a REG_EQUIV or REG_EQUAL note that indicates that the end result is the constant (const_double:DF 1.0), and use that in any simplification. Indeed there is no such note, and I suspect adding it somewhere (expand?) would fix this.
Re: [RFC] - Regression exposed by recent change to compress_float_constant
Fariborz is having trouble with his mailer and has asked me to forward his response. On Aug 10, 2005, at 2:35 PM, Dale Johannesen wrote: On Aug 10, 2005, at 12:43 PM, Fariborz Jahanian wrote: Following patch has exposed an optimization shortcoming: 2005-07-12 Dale Johannesen <[EMAIL PROTECTED]> * expr.c (compress_float_constant): Add cost check. * config/rs6000.c (rs6000_rtx_cost): Adjust FLOAT_EXTEND cost. This patch results in generating worse code for the following test case: 1) Test case: struct S { float d1, d2, d3; I believe you mean double not float; the RTL snippets you give indicate this. Yes, it is double. Copied the wrong test. (insn 12 7 13 0 (set (reg:SF 59) (mem/u/i:SF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S4 A32])) -1 (nil) (nil)) (insn 13 12 14 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1929 ]) [0 .d1+0 S8 A32]) (float_extend:DF (reg:SF 59))) -1 (nil) (nil)) However, if you try your example with float as given, you see it does not do a direct store of constant 0 with or without the compress_float patch. IMO the compress_float patch does not really have anything to do with this problem; Yes. Title says Regression 'exposed' by But as my email pointed out, float_extend is substituted in cse. So, this is another case of change in rtl pattern breaks an optimization down the road. I don't know if this is a regression or exposition of a lurking bug. before this patch the double case was working well by accident, my patch exposed a problem farther downstream, which was always there for the float case. Yes. I mentioned that in my email. - fariborz When I put that patch in, rth remarked: While I certainly wouldn't expect fold_rtx to find out about this all by itself, I'd have thought that there would have been a REG_EQUIV or REG_EQUAL note that indicates that the end result is the constant (const_double:DF 1.0), and use that in any simplification. Indeed there is no such note, and I suspect adding it somewhere (expand?) would fix this.
Inlining vs the stack
We had a situation come up here where things are like this (simplified, obviously): c() { char x[100]; } a() { b(); c(); } b() { a(); c(); } c() is a leaf. Without inlining, no problem. WIth c() inlined into a() and/or b(), a few mutually recursive calls to a() and b() blow out the stack. It's not clear the inliner should try to do anything about this, but I think it's worth discussing. The inliner can't detect the recursive loop in the general case, since it might be split across files, so the thing to do would be put some (target-OS-dependent) limit on local stack usage of the inlinee. Right now there's no such check.
Re: Inlining vs the stack
On Aug 12, 2005, at 12:25 PM, Paul Koning wrote: "Mike" == Mike Stump <[EMAIL PROTECTED]> writes: Mike> On Aug 12, 2005, at 10:39 AM, Dale Johannesen wrote: We had a situation come up here where things are like this (simplified, obviously): c() { char x[100]; } Mike> I think we should turn off inlining for functions > 100k stack Mike> size. (Or maybe 500k, if you want). Why should stack size be a consideration? Code size I understand, but stack size doesn't seem to matter. Sometimes it matters, as in the original example: c() { char x[100]; } a() { b(); c(); } b() { a(); c(); }
Fwd: [RFC] - Regression exposed by recent change to compress_float_constant
Fariborz is still having problems with his mailer and has asked me to forward this. On Aug 10, 2005, at 2:35 PM, Dale Johannesen wrote: On Aug 10, 2005, at 12:43 PM, Fariborz Jahanian wrote: Following patch has exposed an optimization shortcoming: 2005-07-12 Dale Johannesen <[EMAIL PROTECTED]> * expr.c (compress_float_constant): Add cost check. * config/rs6000.c (rs6000_rtx_cost): Adjust FLOAT_EXTEND cost. This patch results in generating worse code for the following test case: 1) Test case: struct S { float d1, d2, d3; I believe you mean double not float; the RTL snippets you give indicate this. (insn 12 7 13 0 (set (reg:SF 59) (mem/u/i:SF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S4 A32])) -1 (nil) (nil)) (insn 13 12 14 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1929 ]) [0 .d1+0 S8 A32]) (float_extend:DF (reg:SF 59))) -1 (nil) (nil)) However, if you try your example with float as given, you see it does not do a direct store of constant 0 with or without the compress_float patch. IMO the compress_float patch does not really have anything to do with this problem; before this patch the double case was working well by accident, my patch exposed a problem farther downstream, which was always there for the float case. When I put that patch in, rth remarked: While I certainly wouldn't expect fold_rtx to find out about this all by itself, I'd have thought that there would have been a REG_EQUIV or REG_EQUAL note that indicates that the end result is the constant (const_double:DF 1.0), and use that in any simplification. Indeed there is no such note, and I suspect adding it somewhere (expand?) would fix this. It turned out that cse does put REG_EQUIV on the insn which sets load of "LC0" to the register. So, no need to do this. It also tells me that cse is expected to use this information to do the constant propagation (which in the example test case is the next few insns). Attached patch accomplishes this task. It is against apple local branch. It has been bootstrapped and dejagnu tested on x86-darwin, ppc-darwin. Note that patch is similar to the code right before it (which is also shown in this patch), so there is a precedence for this type of a fix. If this looks reasonable, I will prepare an FSF patch. ChangeLog: 2005-08-19 Fariborz Jahanian <[EMAIL PROTECTED]> * cse.c (cse_insn): Use the constant to propagte into the rhs of a set insn which is a register. This is cheaper. Index: cse.c === RCS file: /cvs/gcc/gcc/gcc/cse.c,v retrieving revision 1.342.4.3 diff -c -p -r1.342.4.3 cse.c *** cse.c 5 Jul 2005 23:21:50 - 1.342.4.3 --- cse.c 19 Aug 2005 18:21:56 - *** cse_insn (rtx insn, rtx libcall_insn) *** 5455,5460 --- 5455,5469 if (dest == pc_rtx && src_const && GET_CODE (src_const) == LABEL_REF) src_folded = src_const, src_folded_cost = src_folded_regcost = -1; + /* APPLE LOCAL begin radar 4153339 */ + if (n_sets == 1 && GET_CODE (sets[i].src) == REG + && src_const && GET_CODE (src_const) == CONST_DOUBLE) + { + src_folded = src_const; + src_folded_cost = src_folded_regcost = -1; + } + /* APPLE LOCAL end radar 4153339 */ + /* Terminate loop when replacement made. This must terminate since the current contents will be tested and will always be valid. */ while (1) Index: testsuite/ChangeLog.apple-ppc === RCS file: /cvs/gcc/gcc/gcc/testsuite/Attic/ChangeLog.apple-ppc,v retrieving revision 1.1.4.88 diff -c -p -r1.1.4.88 ChangeLog.apple-ppc *** testsuite/ChangeLog.apple-ppc 15 Aug 2005 21:02:26 - 1.1.4.88 --- testsuite/ChangeLog.apple-ppc 19 Aug 2005 18:21:59 - *** *** 1,3 --- 1,8 + 2005-08-18 Fariborz Jahanian <[EMAIL PROTECTED]> + + Radar 4153339 + * gcc.dg/i386-movl-float.c: New. + 2005-08-15 Devang Patel <[EMAIL PROTECTED]> Radar 4209318 Index: testsuite/gcc.dg/i386-movl-float.c === RCS file: testsuite/gcc.dg/i386-movl-float.c diff -N testsuite/gcc.dg/i386-movl-float.c *** /dev/null 1 Jan 1970 00:00:00 - --- testsuite/gcc.dg/i386-movl-float.c 19 Aug 2005 18:22:03 - *** *** 0 --- 1,15 + /* APPLE LOCAL begin radar 4153339 */ + /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ + /* { dg-options "-O1 -mdynamic-no-pic -march=pentium4 -mtune=prescott" } */ + /* { dg-final { scan-assembler-times "movl\[^\\n\]*" 8} } */ + + struct S { + double d1, d2, d3; + }; + + struct S ms() + { + struct S s = {0,0,0}; + return s; + } + /* APPLE LOCAL end radar 4153339 */
Bug in builtin_floor optimization
There is some clever code in convert_to_real that converts double d; (float)floor(d) to floorf((float)d) (on targets where floor and floorf are considered builtins.) This is wrong, because the (float)d conversion normally uses round-to-nearest and can round up to the next integer. For example: double d = 1024.0 - 1.0 / 32768.0; extern double floor(double); extern float floorf(float); extern int printf(const char*, ...); int main() { double df = floor(d); float f1 = (float)floor(d); printf("floor(%f) = %f\n", d, df); printf("(float)floor(%f) = %f\n", d, f1); return 0; } with -O2. The transformation is also done for ceil, round, rint, trunc and nearbyint. I'm not a math guru, but it looks like ceil, rint, trunc and nearbyint are also unsafe for this transformation. round may be salvageable. Comments? Should I preserve the buggy behavior with -ffast-math?
Re: Bug in builtin_floor optimization
On Aug 23, 2005, at 9:53 AM, Richard Henderson wrote: On Tue, Aug 23, 2005 at 09:28:50AM -0600, Roger Sayle wrote: Good catch. This is indeed a -ffast-math (or more precisely a flag_unsafe_math_optimizations) transformation. I'd prefer to keep these transformations with -ffast-math, as Jan described them as significantly helping SPEC's mesa when they were added. Are you sure it was "(float)floor(d)"->"floorf((float)d)" that helped mesa and not "(float)floor((double)f)"->"floorf(f)" ? All the floor calls in mesa seem to be of the form (int)floor((double)f) or (f - floor((double)f)). (the casts to double are implicit, actually.) It wouldn't bother me if the first transformation went away even for -ffast-math. It seems egregeously wrong. I think I'd prefer this, given that it is not useful in mesa. Will put together a patch.
RFC: bug in combine
The following demonstrates a bug in combine (x86 -mtune=pentiumpro -O2): struct Flags { int filler[18]; unsigned int a:14; unsigned int b:14; unsigned int c:1; unsigned int d:1; unsigned int e:1; unsigned int f:1; }; extern int bar(int), baz(); int foo (struct Flags *f) { if (f->b > 0) return bar(f->d); return baz(); } The test of f->b comes out as testl $1048512, 73(%eax) This is wrong, because 4 bytes starting at 73 goes outside the original object and can cause a page fault. The change from referencing a word at offset 72 to offset 73 happens in make_extraction in combine, and I propose to fix it thus: Index: combine.c === RCS file: /cvs/gcc/gcc/gcc/combine.c,v retrieving revision 1.502 diff -u -b -c -3 -p -r1.502 combine.c cvs diff: conflicting specifications of output style *** combine.c 8 Aug 2005 18:30:09 - 1.502 --- combine.c 25 Aug 2005 17:57:21 - *** make_extraction (enum machine_mode mode, *** 6484,6491 && GET_MODE_SIZE (inner_mode) < GET_MODE_SIZE (is_mode)) offset -= GET_MODE_SIZE (is_mode) - GET_MODE_SIZE (inner_mode); ! /* If this is a constant position, we can move to the desired byte. */ ! if (pos_rtx == 0) { offset += pos / BITS_PER_UNIT; pos %= GET_MODE_BITSIZE (wanted_inner_mode); --- 6484,6493 && GET_MODE_SIZE (inner_mode) < GET_MODE_SIZE (is_mode)) offset -= GET_MODE_SIZE (is_mode) - GET_MODE_SIZE (inner_mode); ! /* If this is a constant position, we can move to the desired byte. !This is unsafe for memory objects; it might result in accesses !outside the original object. */ ! if (pos_rtx == 0 && !MEM_P (inner)) { offset += pos / BITS_PER_UNIT; pos %= GET_MODE_BITSIZE (wanted_inner_mode); Still testing, but I'm a bit concerned this is overkill. Are there targets/situations where this transformation is useful or even necessary? Comments?
doloop-opt deficiency
We noticed that the simple loop here extern int a[]; int foo(int w) { int n = w; while (n >= 512) { a[n] = 42; n -= 256; } } was being treated as ineligible for the doloop modification. I think this is a simple pasto; this code was evidently copied from the previous block: Index: loop-iv.c === RCS file: /cvs/gcc/gcc/gcc/loop-iv.c,v retrieving revision 2.35 diff -u -b -c -p -r2.35 loop-iv.c cvs diff: conflicting specifications of output style *** loop-iv.c 21 Jul 2005 07:24:07 - 2.35 --- loop-iv.c 29 Aug 2005 23:34:12 - *** iv_number_of_iterations (struct loop *lo *** 2417,2423 tmp0 = lowpart_subreg (mode, iv0.base, comp_mode); tmp1 = lowpart_subreg (mode, iv1.base, comp_mode); ! bound = simplify_gen_binary (MINUS, mode, mode_mmin, lowpart_subreg (mode, step, comp_mode)); if (step_is_pow2) { --- 2417,2423 tmp0 = lowpart_subreg (mode, iv0.base, comp_mode); tmp1 = lowpart_subreg (mode, iv1.base, comp_mode); ! bound = simplify_gen_binary (PLUS, mode, mode_mmin, lowpart_subreg (mode, step, comp_mode)); if (step_is_pow2) { The code as it was computed -2147483648-256 which overflows. Still testing, but is there anything obvious wrong with this?
Re: doloop-opt deficiency
extern int a[]; int foo(int w) { int n = w; while (n >= 512) { a[n] = 42; n -= 256; } } On Aug 30, 2005, at 9:25 AM, Sebastian Pop wrote: Thanks for looking at this. But... Dale Johannesen wrote: I think this is a simple pasto; this code was evidently copied from the previous block: I don't think that this was a simple pasto. The code looks correct. We have the same code in tree-ssa-loop-niter.c around line 436, since we inherited this code from the rtl-level. No, look closer. The version in loop-iv.c does a NEG of 'step' just before what's shown here. The version in tree-ssa-loop-niter.c doesn't. Reversing the operator does make them do the same thing. As a sanity check, try the same loop going the other direction: extern int a[]; int foo(int w) { int n = w; while (n <= 512) { a[n] = 42; n += 256; } } and you'll see it does do the doloop transformation.
Re: rtl line no
On Sep 11, 2005, at 8:09 AM, shreyas krishnan wrote: Hi, Can anyone tell me if there is a way to find out roughly the source line no of a particular rtl instruction (if there is ) ? I believe tree has a link to the source line no, in which case how do I find out the source tree node for a particular rtl stmt ? See INSN_LOCATOR and locator_line().
RFA: pervasive SSE codegen inefficiency
Consider the following SSE code (-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2) #include __m128i foo3(__m128i z, __m128i a, int N) { int i; for (i=0; i The first inner loop compiles to paddq %xmm0, %xmm1 Good. The second compiles to movdqa %xmm2, %xmm0 paddw %xmm1, %xmm0 movdqa %xmm0, %xmm1 when it could be using a single paddw. The basic problem is that our approach defines __m128i to be V2DI even though all the operations on the object are V4SI, so there are a lot of subreg's that don't need to generate code. I'd like to fix this, but am not sure how to go about it. The pattern-matching and RTL optimizers seem quite hostile to mismatched mode operations. If I were starting from scratch I'd define a single V128I mode and distinguish paddw and paddq by operation codes, or possibly by using subreg:SSEMODEI throughout the patterns. Any less intrusive ideas? Thanks. (ISTR some earlier discussion about this but can't find it; apologies if I'm reopening something that shouldn't be:)
Re: RFA: pervasive SSE codegen inefficiency
On Sep 14, 2005, at 9:50 PM, Andrew Pinski wrote: On Sep 14, 2005, at 9:21 PM, Dale Johannesen wrote: Consider the following SSE code (-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2) <4256776a.c> The first inner loop compiles to paddq %xmm0, %xmm1 Good. The second compiles to movdqa %xmm2, %xmm0 paddw %xmm1, %xmm0 movdqa %xmm0, %xmm1 when it could be using a single paddw. The basic problem is that our approach defines __m128i to be V2DI even though all the operations on the object are V4SI, so there are a lot of subreg's that don't need to generate code. I'd like to fix this, but am not sure how to go about it. From real looks of this looks more like a register allocation issue and nothing to do with subregs at all, except subregs being there. That's kind of an overstatement; obviously getting rid of the subregs would solve the problem as you can see from the first function. I think you're right that If we allocated 64 and 63 as the same register, it would have worked correctly. (you mean 64 and 66) would fix this example; I'll look at that. Having a more uniform representation for operations on __m128i objects would simplify things all over the place, though.
Re: RFA: pervasive SSE codegen inefficiency
Just to review, the second function here was the problem: (-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2) #include __m128i foo3(__m128i z, __m128i a, int N) { int i; for (i=0; i where the inner loop compiles to movdqa %xmm2, %xmm0 paddw %xmm1, %xmm0 movdqa %xmm0, %xmm1 instead of a single paddw. Response was that I should look at the register allocator. OK. Rtl coming in looks like: R70:v8hi <- R59:v8hi + subreg:v8hi (R66:v2di) R66:v2di <- subreg:v2di(R70:v8hi) where R70 is used only in these 2 insns, and R66 is live on entry and exit to the loop. First, local-alloc picks a hard reg (R21) for R70. Global has some code that tries to assign R66 to the same hard regs as things that R66 is copied to (copy_preference); that code doesn't look under subregs, so isn't triggered in this rtl. It's straightforward to extend this code to look under subregs, and that works for this example. (Although just which subregs are safe to look under will require more attention than I've given it, if we want this in.) However, that's not the whole problem. When we have two accumulators in the loop: #include __m128i foo1(__m128i z, __m128i a, __m128i b, int N) { int i; for (i=0; i R70:v8hi <- R59:v8hi + subreg:v8hi (R66:v2di) R66:v2di <- subreg:v2di(R70:v8hi) R72:v8hi <- R61:v8hi + subreg:v8hi (R68:v2di) R68:v2di <- subreg:v2di(R72:v8hi) local-alloc assigns the same reg (R21) to R70 and R72. This means R21 conflicts with both R66 and R68, so is not considered for either of them, and the copy_preference optimization isn't invoked. I don't see a way to fix that in global. Doing round-robin allocation in local-alloc would alleviate that...for a while, until the block gets big enough that registers are reused; that's not a complete solution. Really I don't think this is an RA problem at all. We ought to be able to combine these patterns no matter what the RA does. The following pattern makes combine do it: (define_insn "*addmixed3" [(set (match_operand:V2DI 0 "register_operand" "=x") (subreg:V2DI (plus:SSEMODE124 (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm") (subreg:SSEMODE124 (match_operand:V2DI 1 "nonimmediate_operand" "%0") 0)) 0))] "TARGET_SSE2 && ix86_binary_operator_ok (PLUS, mode, operands)" "padd\t{%2, %0|%0, %2}" [(set_attr "type" "sseiadd") (set_attr "mode" "TI")]) I'm not very happy about this because it's really not an x86 problem either, at least in theory, but flushing the problem down to the RA doesn't look profitable. Comments?
Re: RFA: pervasive SSE codegen inefficiency
On Sep 19, 2005, at 5:30 PM, Richard Henderson wrote: (define_insn "*addmixed3" [(set (match_operand:V2DI 0 "register_operand" "=x") (subreg:V2DI (plus:SSEMODE124 (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm") (subreg:SSEMODE124 (match_operand:V2DI 1 "nonimmediate_operand" "%0") 0)) 0))] I absolutely will not allow you do add 5000 of these patterns. Which is what you'll need if you think you'll be able to solve the problem this way. Do you have any constructive suggestions for how the RA might be fixed, then?
Re: RFA: pervasive SSE codegen inefficiency
On Sep 19, 2005, at 9:15 PM, Richard Henderson wrote: On Mon, Sep 19, 2005 at 05:33:54PM -0700, Dale Johannesen wrote: Do you have any constructive suggestions for how the RA might be fixed, then? Short term? No. But I don't see this as a short term problem. OK. Unfortunately, it is a short term problem for Apple. I don't know how to fix it in the RA and it looks like nobody else does either, so I'll have to do something local, I guess. (Thanks Daniel and Giovanni, suggestions for incremental updates that don't address this problem are not really what I was looking for here.)
x86 SSE constants
The C constraint on x86 is defined, in both the doc and the comments, as "constant that can be easily constructed in SSE register without loading from memory". Currently the only one handled is 0, but there is at least one more, all 1 bits, which is constructed by pcmpeqd %xmm, %xmm Unfortunately there are quite a few places in the patterns that assume C means zero, and generate pxor or something like that. What would be the preferred way to fix this, new constraint or change the existing patterns?
Re: x86 SSE constants
On Sep 30, 2005, at 4:17 PM, Jan Hubicka wrote: The C constraint on x86 is defined, in both the doc and the comments, as "constant that can be easily constructed in SSE register without loading from memory". Currently the only one handled is 0, but there is at least one more, all 1 bits, which is constructed by pcmpeqd %xmm, %xmm Unfortunately there are quite a few places in the patterns that assume C means zero, and generate pxor or something like that. What would be the preferred way to fix this, new constraint or change the existing patterns? My original plan was to add pcmpeqd by extending the 'C' constraint and the patterns where pxor/xorp? is currently generated unconditionally. This is pretty similar to what we do to i387 constants as well. I never actually got to realizing this (for the scalar FP work I was mostly interested in that time it was not at all that interesting), but I think there is nothing in md file preventing it (or I just missed it when it was added :)... No, there isn't, but it might be a smaller change to add a new constraint having constraints tied to specific constants is pretty ugly, and so is having (if (constant value==0)) in a lot of patterns..,,
RFC: redundant stores in C++
In C++, when we have an automatic array with variable initializers: void bar(char[4]); void foo(char a, char b, char c, char d) { char x[4] = { a, b, c, d }; bar(x); } the C++ FE generates 32-bit store(s) of 0 for the entire array, followed by stores of the individual elements. In the case above, where the elements are not 32-bits, the optimizers do not figure out they can eliminate the redundant store(s) of 0. The C FE does not generate that to begin with, and the C++ FE should not either. This is not my native habitat, but I think this is the right general idea: *** typeck2.c Thu Aug 4 17:52:43 2005 --- /Network/Servers/harris/Volumes/haus/johannes/temp/typeck2.cSat Oct 1 14:44:46 2005 *** split_nonconstant_init (tree dest, tree *** 534,540 code = push_stmt_list (); split_nonconstant_init_1 (dest, init); code = pop_stmt_list (code); ! DECL_INITIAL (dest) = init; TREE_READONLY (dest) = 0; } else --- 534,551 code = push_stmt_list (); split_nonconstant_init_1 (dest, init); code = pop_stmt_list (code); ! /* APPLE LOCAL begin */ ! /* If the constructor now doesn't construct anything, that !means constant 0 for the entire object. We don't need !to do this for non-statically-allocated objects. !Functionally it is harmless, but leads to inferior code !in cases where the optimizers don't get rid of the !redundant stores of 0. */ ! if (TREE_CODE (dest) != VAR_DECL ! || TREE_STATIC (dest) ! || CONSTRUCTOR_ELTS (init) != 0) ! DECL_INITIAL (dest) = init; ! /* APPLE LOCAL end */ TREE_READONLY (dest) = 0; } else Testsuite passes with this but I can believe improvements are possible; comments?
Re: RFC: redundant stores in C++
On Oct 1, 2005, at 7:29 PM, Andrew Pinski wrote: I don't think this will work for the following code: void foo(char a, char b) { char x[4] = { a, b } ; if (x[3] != 0) abort (); } Duh. I thought that was too easy. But better fix would be not call split_nonconstant_init_1 for local decls and have the front-end produce a CONSTRUCTOR which is just like what the C front-end produces. I'll try it. \
Re: Should -msse3 enable fisttp
On Oct 3, 2005, at 3:49 PM, Andrew Pinski wrote: On Oct 3, 2005, at 6:41 PM, Evan Cheng wrote: But according to the manual -msse3 does not turn on generation of SSE3 instructions: The manual is semi-confusing I had forgot about that. There is a bug about the issue recorded as PR 23809: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23809 I'm a little disappointed the behavior of the compiler was changed without addressing this. Maybe somebody could review the patch in that radar?
Re: RFC: redundant stores in C++
On Oct 1, 2005, at 8:41 PM, Andrew Pinski wrote: On Oct 1, 2005, at 11:10 PM, Dale Johannesen wrote: But better fix would be not call split_nonconstant_init_1 for local decls and have the front-end produce a CONSTRUCTOR which is just like what the C front-end produces. I'll try it. This patch should fix the problem and also fixes FSF PR 8045 at the same time. FSF PR 8045 is about an missing unused variable causes by this code. This patch causes us to be more similar with the C front-end. It should also cause us save us some compile time issue when gimplifing and memory too. Note I have not tested this yet by either looking at the code gen or even compiling it. I will be doing a bootstrap/test of this right now. -- Pinski Index: typeck2.c === RCS file: /cvs/gcc/gcc/gcc/cp/typeck2.c,v retrieving revision 1.192 diff -u -p -r1.192 typeck2.c --- typeck2.c 1 Aug 2005 04:02:26 - 1.192 +++ typeck2.c 2 Oct 2005 03:36:41 - @@ -613,10 +613,13 @@ store_init_value (tree decl, tree init) value = digest_init (type, init); /* If the initializer is not a constant, fill in DECL_INITIAL with the bits that are constant, and then return an expression that - will perform the dynamic initialization. */ + will perform the dynamic initialization. We don't have to do this + for local variables either. */ if (value != error_mark_node && (TREE_SIDE_EFFECTS (value) - || ! initializer_constant_valid_p (value, TREE_TYPE (value + || ! initializer_constant_valid_p (value, TREE_TYPE (value))) + && (TREE_CODE (decl) != VAR_DECL + || TREE_STATIC (dest))) return split_nonconstant_init (decl, value); /* If the value is a constant, just put it in DECL_INITIAL. If DECL is an automatic variable, the middle end will turn this into a Thanks. The last line of this patch should use "decl", not "dest". With that obvious change it tests OK in apple's branch. (I cannot build mainline on darwin x86 at the moment for unrelated reasons - nothing that means there is a problem in mainline, I don't think.)
Re: Need advice: x86 redudant compare to zero
My question is: where and how would you suggest we do this optimization. With peephole2? Or in combine? In i386.md, I see pattern *subsi_2 looks like what I'd like to combine these two insn into: (define_insn "*subsi_2" [(set (reg FLAGS_REG) (compare (minus:SI (match_operand:SI 1 "nonimmediate_operand" "0,0") (match_operand:SI 2 "general_operand" "ri,rm")) (const_int 0))) (set (match_operand:SI 0 "nonimmediate_operand" "=rm,r") (minus:SI (match_dup 1) (match_dup 2)))] "ix86_match_ccmode (insn, CCGOCmode) && ix86_binary_operator_ok (MINUS, SImode, operands)" "sub{l}\t{%2, %0|%0, %2}" [(set_attr "type" "alu") (set_attr "mode" "SI")]) That's quite similar to several PPC patterns for andi. , and they work. If you've got two other insns that look like the set's I'd expect combine to merge them, and would look to see why it doesn't.
Re: backslash whitespace newline
On Oct 25, 2005, at 5:40 PM, Joe Buck wrote: The problem, I think, is that the behavior of both GCC *and* the other compilers does not serve the users. The reason is that there simply isn't any reason why a user would use a backslash to continue a C++ comment on purpose, and plenty of reason why she might do it by accident. ...users think they can put anything in a comment. A backslash at the end is likely to be an accident, since just starting the next line with a // is easy enough. Yes. From the user's point of view, the best thing appears to be treating backslashes in C++ comments as part of the comment, regardless of what follows them; that seems to follow the principle of least surprise. That's not standard conforming, and therefore I'm not advocating it for gcc, but it probably wouldn't break anything outside compiler testsuites. Maybe this treatment should be made standard conforming...?
Re: Link-time optimzation
On Nov 17, 2005, at 3:09 PM, Robert Dewar wrote: Richard Earnshaw wrote: We spend a lot of time printing out the results of compilation as assembly language, only to have to parse it all again in the assembler. I never like arguments which have loaded words like "lot" without quantification. Just how long *is* spent in this step, is it really significant? When I arrived at Apple around 5 years ago, I was told of some recent measurements that showed the assembler took around 5% of the time. Don't know if that's still accurate. Of course the speed of the assembler is also relevant, and our stubs and lazy pointers probably mean Apple's .s files are bigger than other people's.
Re: identifying c++ aliasing violations
On Dec 5, 2005, at 12:03 AM, Giovanni Bajo wrote: Jack Howarth <[EMAIL PROTECTED]> wrote: What exactly is the implication of having a hundred or more of this in an application being built with gcc/g++ 4.x at -O3? Does it only risk random crashes in the generated code or does it also impact the quality of the generated code in terms of execution speed? The main problem is wrong-code generation. Assuming the warning is right and does not mark false positives, you should have those fixed. I don't think quality of the generated code would be better with this change. However, it's pretty strange that C++ code generation is worse with GCC 4: I saw many C++ programs which actually got much faster due to higher lever optimazations (such as SRA). You should really try and identify inner loops which might have been slowed down and submit those as bugreports in our Bugzilla. Could also be inlining differences, and you might check out whether -fno-threadsafe-statics is applicable; that can make a big difference. Bottom line, you're going to have to do some analysis to figure out why it got slower. (It sounds like you're on a MacOSX system, in which case Shark is a good tool for this.)
Re: Performance comparison of gcc releases
On Dec 16, 2005, at 10:31 AM, Dan Kegel wrote: Ronny Peine wrote: -ftree-loop-linear is removed from the testingflags in gcc-4.0.2 because it leads to an endless loop in neural net in nbench. Could you fill a bug report for this one? Done. This is probably the same as 20256. Your PR is a bit short on details. For instance, it'd be nice to include a link to the source for nbench, so people don't have to guess what version you're using. Was it http://www.tux.org/~mayer/linux/nbench-byte-2.2.2.tar.gz ? It'd be even more helpful if you included a recipe a sleepy person could use to reproduce the problem. In this case, something like wget http://www.tux.org/~mayer/linux/nbench-byte-2.2.2.tar.gz tar -xzvf nbench-byte-2.2.2.tar.gz cd nbench-byte-2.2.2 make CC=gcc-4.0.1 CFLAGS="-ftree-loop-linear" Unfortunately, I couldn't reproduce your problem with that command. Can you give me any tips? Finally, it's helpful when replying to the list about filing a PR to include the PR number or a link to the PR. The shortest link is just gcc.gnu.org/PR%d, e.g. http://gcc.gnu.org/PR25449 - Dan -- Wine for Windows ISVs: http://kegel.com/wine/isv
Re: Corrupted Profile Information
On Jan 26, 2006, at 4:05 PM, [EMAIL PROTECTED] wrote: I really need correct profile information before PRE. By moving rest_of_handle_branch_prob() just before rest_of_handle_gcse() have I violated some critical assumptions which is causing the profile information to be occasionally corrupted ? Yes; various CFG transformations before the profiling phase don't maintain the profiling info, because there isn't any. In gcc-4 the profiling phase has been moved much earlier and this information is maintained by the later transformations. Backporting all that logic to 3.4 might be possible, but is not easy. You're better off using gcc-4.
Re: x86-64, I definitely can't make sense out of that
On Feb 4, 2006, at 7:06 AM, Andrew Pinski wrote: signs_all[4] = { !(sx > 0), !(sy > 0), !(sz > 0), 0 }, C++ front-end produces: <>; <>>; <<< Unknown tree: expr_stmt signs_all[1] = (int) sy <= 0 >>>; <<< Unknown tree: expr_stmt signs_all[2] = (int) sz <= 0 >>>; While the C front-end is producing: const int signs_all[4] = {(int) sx <= 0, (int) sy <= 0, (int) sz <= 0, 0}; Dale Johannesen and I came up with a patch to the C++ front-end for this except it did not work with some C++ cases. Yes, we had it in Apple's branch for a while and had to back it out. The place to look is split_nonconstant_init in cp/typeck2.c if you want to try. The tricky part is making sure the entire object is initialized in all cases when only a partial initializer is specified.
Re: x86 -ffast-math problem on SPEC CPU 2K
On Feb 23, 2006, at 8:54 AM, H. J. Lu wrote: When I use -O2 -mtune=pentium4 -ffast-math on SPEC CPU 2K on Linux/x86 with gcc 4.2, I get *** Miscompare of 200.s, see /export/spec/src/2000/spec/benchspec/CINT2000/176.gcc/run/0004/ 200.s.mis *** Miscompare of scilab.s, see /export/spec/src/2000/spec/benchspec/CINT2000/176.gcc/run/0004/ scilab.s.mis Is that a known issue? This is what you get if the benchmark source thinks the host is of the wrong endianness. Do you have -DHOST_WORDS_BIG_ENDIAN in your config file perhaps?
Re: documentation on inlining model needed
On Mar 7, 2006, at 12:28 AM, Yang Yang wrote: Recently, I'm very interested in the inlining model of gcc.I need a detailed documentation describing how the inlining is implemented in gcc 4.0. Anybody who has been or is working on it please send me a documentation. I'd really appreciate your help. There is no such documentation; you're going to have to look at the source. The mechanism of actually duplicating a function body and substituting it for a call is in tree-inline.c. The decision about which calls to expand inline is made in cgraph.c and cgraphunit.c.
Re: "Experimental" features in releases
On Apr 17, 2006, at 11:52 AM, Mark Mitchell wrote: Dan Berlin and I exchanged some email about PR 26435, which concerns a bug in -ftree-loop-linear, and we now think it would make sense to have a broader discussion. The PR in question is about an ice-on-valid regression in 4.1, when using -O1 -ftree-loop-linear. Dan notes that this optimization option is "experimental", but I didn't see that reflected in the documentation, which says: @item -ftree-loop-linear Perform linear loop transformations on tree. This flag can improve cache performance and allow further loop optimizations to take place. I wasn't aware that it was supposed to be experimental either, and it wasn't explained that way when it went in (Sep 2004). (Incomplete or buggy would not be surprising, but it sounds now like we're talking about fatally flawed design, which is different.) In any case, the broader question is: to what extent should we have experimental options in releases, and how should we warn users of their experimental nature? In general I would agree in principle with Diego that such features don't belong in releases, but this isn't the first time features have been found to be buggy after they've gone in. -frename-registers comes to mind; in that case, the bugginess was documented for several releases, and that warning has recently been removed as the bugs are believed to be fixed. This optimization is worth about a 5x speedup on one of the SPECmarks (see discussion in archives), so IMO we should consider carefully before removing it. It was in 4.0 and 4.1 releases. My suggestion is that features that are clearly experimental (like this one) should be (a) documented as such, and (b) should generate a warning, like: warning: -ftree-loop-linear is an experimental feature and is not recommended for production use Looks good to me.
Re: "Experimental" features in releases
On Apr 17, 2006, at 2:31 PM, Richard Guenther wrote: On 4/18/06, Ivan Novick <[EMAIL PROTECTED]> wrote: I am a gcc user at a fininancial institution and IMHO it would not be a good idea to have non-production ready functionality in gcc. We are trying to use gcc for mission critical functionality. It has been always the case that additional options not enabled with any regular -O level gets less testing and more likely has bugs. So for mission critical functionality I would strongly suggest to stay with -O2 and not try to rely on not thoroughly tested combinations of optimization options. I'd go further: you should not be trusting a compiler (gcc or any other) to be correct in "mission critical" situations. Finding a compiler without bugs is not a realistic expectation. Every compiler release I'm familiar with has had bugs. So from my point of view, the situation with -ftree-loop-linear is fine - it's ICEing after all, not producing silently wrong-code. For experimental options (where I would include all options not enabled by -O[123s]) known wrong- code bugs should be fixed. The case of this in 20256 did produce silent bad code when it was reported, but that seems to have changed.
Re: "Experimental" features in releases
On Apr 19, 2006, at 12:04 AM, Kai Henningsen wrote: [EMAIL PROTECTED] (Daniel Berlin) wrote on 18.04.06 in <[EMAIL PROTECTED]>: This is in fact, not terribly surprising, since the algorithm used was the result of Sebastian and I sitting at my whiteboard for 30 minutes trying to figure out what we'd need to do to make swim happy :). This would leave -ftree-loop-linear in 4.2, but make it not useful for increasing SPEC scores. So is this an object lesson for why optimizing for benchmarks is a bad idea? If you're inclined to believe this, you could find a confirming instance here, but there are other lessons that could be drawn. If you go back to the original thread, you'll see this from Toon Moene: http://gcc.gnu.org/ml/gcc-patches/2004-09/msg00256.html It didn't have to be a benchmark-only optimization.
Re: "Experimental" features in releases
On Apr 19, 2006, at 11:52 AM, Daniel Berlin wrote: So is this an object lesson for why optimizing for benchmarks is a bad idea? If you're inclined to believe this, you could find a confirming instance here, but there are other lessons that could be drawn. If you go back to the original thread, you'll see this from Toon Moene: http://gcc.gnu.org/ml/gcc-patches/2004-09/msg00256.html It didn't have to be a benchmark-only optimization. It isn't a benchmark only optimization. Only the perfect nest conversion was targeted for the benchmarks, because it was necessary. The rest uses standard spatial optimality metrics to decide whether it makes sense to interchange loops or not, and *that* works great on fortran applications (except for a few other random bugs). OK, I didn't get that.
Re: address order and BB numbering
On May 19, 2006, at 12:48 PM, sean yang wrote: Although "BASIC_BLOCK array contains BBs in an unspecified order" as the GCC internal doc says, can I assume that the final virtual address for an instruction in BB_m is always higher than the virtual address for an instruction in BB_n, when m < n. (Let's assume the linker for the target machine produce code from low address to high address.) Definitely not. Various phases that need to know the order of insns produce a CUID for that phase, but it is not maintained globally.