Abt RTL expression
Hello all, While going through the RTL dumps, I noticed a few things which i need to get clarified. Below is the extract, in which i get the doubt. (insn 106 36 107 6 (set (reg:SI 13 a5) (const_int -20 [0xffec])) 17 {movsi_short_const} (nil) (nil)) (insn 107 106 108 6 (parallel [ (set (reg:SI 13 a5) (plus:SI (reg:SI 13 a5) (reg/f:SI 14 a6))) (clobber (reg:CC 21 cc)) ]) 29 {addsi3} (nil) (expr_list:REG_EQUIV (plus:SI (reg/f:SI 14 a6) (const_int -20 [0xffec])) (nil))) (insn 108 107 38 6 (set (reg:SI 13 a5) (mem/c:SI (reg:SI 13 a5) [0 S4 A32])) 15 {movsi_load} (nil) (nil)) My Deductions: 1. In insn 106, we are storing -16 in to the register 13 (a5). 2. In insn 107, we are taking the value from register 14 (a6) which is a pointer and subtracting 16 from it and storing in a5. Now a6 contains the stack pointer. Therefore a5 now contains SP-16. 3. In insn 108, we are storing the value pointed by the register a5 in to a5. Is my deduction for insn 108 right? If it is right, shouldn't the expression be like this: (mem/c:SI (reg/f:SI 13 a5) [0 S4 A32])) 15 {movsi_load} (nil) if i am wrong, can anyone tell me what actually insn 108 means? Regards, Rohit
strict aliasing question
I see a lot of APIs (e.g. Cyrus SASL) that have accessor functions returning values through a void ** argument. As far as I can tell, this doesn't actually cause any problems, but gcc 4.1 with -Wstrict-aliasing will complain. For example, take these two separate source files: alias1.c #include extern void getit( void **arg ); main() { int *foo; getit( (void **)&foo); printf("foo: %x\n", *foo); } alias2.c static short x[] = {16,16}; void getit( void **arg ) { *arg = x; } gcc -O3 -fstrict-aliasing -Wstrict-aliasing *.c -o alias The program prints the expected result with both strict-aliasing and no-strict-aliasing on my x86_64 box. As such, when/why would I need to worry about this warning? -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: strict aliasing question
On 11/10/06, Howard Chu <[EMAIL PROTECTED]> wrote: I see a lot of APIs (e.g. Cyrus SASL) that have accessor functions returning values through a void ** argument. As far as I can tell, this doesn't actually cause any problems, but gcc 4.1 with -Wstrict-aliasing will complain. For example, take these two separate source files: alias1.c #include extern void getit( void **arg ); main() { int *foo; getit( (void **)&foo); printf("foo: %x\n", *foo); } alias2.c static short x[] = {16,16}; void getit( void **arg ) { *arg = x; } gcc -O3 -fstrict-aliasing -Wstrict-aliasing *.c -o alias The program prints the expected result with both strict-aliasing and no-strict-aliasing on my x86_64 box. As such, when/why would I need to worry about this warning? If you compile with -O3 -combine *.c -o alias it will break. Richard.
RE: [m32c-elf] losing track of register lifetime in combine?
On 10 November 2006 07:13, Ian Lance Taylor wrote: > DJ Delorie <[EMAIL PROTECTED]> writes: > >> I compared the generated code with an equivalent explicit test, >> and discovered that gcc uses a separate rtx for the intermediate: >> >> i = 0xf; >> if (j >= 16) >> { >> int i2; >> i2 = i >> 8; >> i = i2 >> 8; >> j -= 16; >> } >> >> This seems to avoid the combiner problem, becuase you don't have the >> same register being set and being used in one insn. Does this explain >> why combine was having a problem, or was this a legitimate thing to do >> and the combiner is still wrong? Using a temp in the expander works >> around the problem. > > Interesting. Using a temporary is the natural way to implement this > code. But not using a temporary should be valid. So I think there is > a bug in combine. Doesn't this just suggest that there's a '+' constraint modifier missing from an operand in a pattern in the md file somewhere, such as the one that expands the conditional in the first place? cheers, DaveK -- Can't think of a witty .sigline today
RE: How to create both -option-name-* and -option-name=* options?
On 10 November 2006 07:34, Brooks Moses wrote: > The Fortran front end currently has a lang.opt entry of the following form: > >ffixed-line-length- >Fortran RejectNegative Joined UInteger > > I would like to add to this the following option which differs in the > last character, but should be treated identically: > >ffixed-line-length= >Fortran RejectNegative Joined UInteger >In file included from tm.h:7, > from ../../svn-source/gcc/genconstants.c:32: >options.h:659: error: redefinition of `OPT_ffixed_line_length_' >options.h:657: error: `OPT_ffixed_line_length_' previously defined > here > > This is because both the '=' and the '-' in the option name reduce to a > '_' in the enumeration name, which of course causes the enumerator to > get defined twice -- and that's a problem, even though I'm quite happy > for the options to both be treated identically. > > There's not really any good way around this problem, is there? It may seem a bit radical, but is there any reason not to modify the option-parsing machinery so that either '-' or '=' are treated interchangeably for /all/ options with joined arguments? That is, whichever is specified in the .opt file, the parser accepts either? cheers, DaveK -- Can't think of a witty .sigline today
Re: Abt RTL expression - combining instruction
Hi all, Finally got the combined compare_and_branch instruction to work. But it has some side effects while testing other files. 20010129-1.s: Assembler messages: 20010129-1.s:46: Error: Value of 0x88 too large for 7-bit relative instruction offset I just designed my compare and branch insn as given below: (define_insn "compare_and_branch_insn" [(set (pc) (if_then_else (match_operator 3 "comparison_operator" [(match_operand:SI 1 "register_operand" "r,r") (match_operand:SI 2 "nonmemory_operand" "O,r")]) (label_ref (match_operand 0 "" "")) (pc)))] "" "* output_asm_insn (\"cmp\\t%2, %1\", operands); /* Body of branch insn */ " [(set (attr "length") (if_then_else (ltu (plus (minus (match_dup 0) (pc)) (const_int 128)) (const_int 250)) (const_int 4) (if_then_else (ltu (plus (minus (match_dup 0) (pc)) (const_int 65536)) (const_int 131072)) (if_then_else (eq_attr "align_lbranch" "true") (const_int 6) (const_int 5)) (if_then_else (eq_attr "call_type" "short") (const_int 8) (const_int 16) (set_attr "delay_type" "delayed") (set_attr "type" "compare,branch")] ) 1. Does attribute length affect the calculation of offset? 2. What are the other factors that i have to take into consideration while combining a compare and branch instruction. Regards, Rohit On 08 Nov 2006 07:00:29 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: "Rohit Arul Raj" <[EMAIL PROTECTED]> writes: > I have used cbranchmode4 instruction to generate combined compare and > branch instruction. > > (define_insn "cbranchmode4" > (set (pc) (if_then_else > (match_operator:CC 0 "comparison_operator" > [ (match_operand:SI 1 "register_operand" "r,r") > (match_operand:SI 2 "nonmemory_operand" "O,r")]) > (label_ref (match_operand 3 "" "")) > (pc)))] > This pattern matches if the code is of the form > > if ( h == 1) > p = 0; > > if the code is of the form > if (h), if (h >= 0) > p = 0; > > Then it matches the seperate compare and branch instructions and not > cbranch instruction. > > Can anyone point out where i am going wrong? If you have a cbranch insn, and you want that one to always be recognized, then why do you also have separate compare and branch insns? Ian
Question on tree-nested.c:convert_nl_goto_reference
I have a test case (involving lots of new code that's not checked in yet) that's blowing up with a nonlocal goto and I'm wondering how it ever worked because it certainly appears to me that DECL_CONTEXT has to be copied from label to new_label. But it isn't. So how are nonlocal gotos working?
Re: Abt long long support
On 11/10/06, Mike Stump <[EMAIL PROTECTED]> wrote: On Nov 9, 2006, at 6:39 AM, Mohamed Shafi wrote: > When i diff the rtl dumps for programs passing negative value with and > without frame pointer i find changes from file.greg . A quick glance at the rtl shows that insn 95 tries to use [a4+4] but insn 94 clobbered a4 already, also d3 is used by insn 93, but there isn't a set for it. The following part of the rtl dump of greg pass is the one which is giving the wrong output. (insn 90 29 91 6 (set (reg:SI 12 a4) (const_int -16 [0xfff0])) 17 {movsi_short_const} (nil) (nil)) (insn 91 90 94 6 (parallel [ (set (reg:SI 12 a4) (plus:SI (reg:SI 12 a4) (reg/f:SI 14 a6))) (clobber (reg:CC 21 cc)) ]) 29 {addsi3} (nil) (expr_list:REG_EQUIV (plus:SI (reg/f:SI 14 a6) (const_int -16 [0xfff0])) (nil))) (insn 94 91 95 6 (set (reg:SI 12 a4) (mem/c:SI (reg:SI 12 a4) [0 D.1863+0 S4 A32])) 15 {movsi_load} (nil) (nil)) (insn 95 94 31 6 (set (reg:SI 13 a5 [orig:12+4 ] [12]) (mem/c:SI (plus:SI (reg:SI 12 a4) (const_int 4 [0x4])) [0 D.1863+4 S4 A32])) 15 {movsi_load} (nil) (nil)) (insn 31 95 87 6 (parallel [ (set (reg:DI 2 d2) (minus:DI (reg:DI 0 d0 [34]) (reg:DI 12 a4))) (clobber (reg:CC 21 cc)) ]) 33 {subdi3} (nil) (nil)) Setting of register d3 is actually done in insns 31 . (set (reg:DI 2 d2) Since this is in DI mode it is using d2 and d3 in DI mode.Similary d0 and a4 is accessed in DI mode. So d1 and a5 is also being used in this insns.Hence negations is proper. Just like Mike pointed out 95 tries to use [a4+4] but insn 94 clobbered a4 already. The compiler should actually generate insn similar to insn 91 and 92 in between insn 94 and 95, but not using a4,or after saving a4. This is not happening. Insn 90 to 94 are emitted only from greg pass onwards. When i inserted the necessary assembly instructions correspoinding to movsi_short_const and addsi3 between insns 91 and 92 in the assemble file , the program worked fine. There are spill codes for insns 31 in the beginning of the the .greg file but i cant understand anything of that. Spilling for insn 31. Using reg 2 for reload 2 Using reg 12 for reload 3 Using reg 13 for reload 0 Using reg 13 for reload 1 The same program works for gcc 3.2 and gcc3.4.6 ports of the same private target I am not sure whether this is because of reload pass or global register allocation. 1. What could be the reason for this behavior? 2. How to overcome this type of behavior Regards, Shafi
Re: Canonical type nodes, or, comptypes considered harmful
Mike Stump wrote: Now, what are the benefits and weaknesses between mine and your, you don't have to carry around type_context the way mine would, that's a big win. You don't have to do anything special move a reference to a type around, that's a big win. You have to do a structural walk if there are any bits that are used for type equality. No, these bits can be placed together - a structural walk is only necessary when (some of) these bits themselves need more scrutiny - i.e. on at least one of the sides some of the constituent parts is partially incomplete. And I can't see how you can avoid that complexity. In my scheme, I don't have to. I just have a vector of items, they are right next to each other, in the same cache line. Again, the equality of the items might not not trivial.
Re: [m32c-elf] losing track of register lifetime in combine?
"Dave Korn" <[EMAIL PROTECTED]> writes: > On 10 November 2006 07:13, Ian Lance Taylor wrote: > > > DJ Delorie <[EMAIL PROTECTED]> writes: > > > >> I compared the generated code with an equivalent explicit test, > >> and discovered that gcc uses a separate rtx for the intermediate: > >> > >> i = 0xf; > >> if (j >= 16) > >> { > >> int i2; > >> i2 = i >> 8; > >> i = i2 >> 8; > >> j -= 16; > >> } > >> > >> This seems to avoid the combiner problem, becuase you don't have the > >> same register being set and being used in one insn. Does this explain > >> why combine was having a problem, or was this a legitimate thing to do > >> and the combiner is still wrong? Using a temp in the expander works > >> around the problem. > > > > Interesting. Using a temporary is the natural way to implement this > > code. But not using a temporary should be valid. So I think there is > > a bug in combine. > > > Doesn't this just suggest that there's a '+' constraint modifier missing > from an operand in a pattern in the md file somewhere, such as the one that > expands the conditional in the first place? Not necessarily. I would guess that it's a define_expand which generates a pseudo-register and uses it as (set (reg) (ashiftrt (reg) (const_int 8))) That is OK. In any case a '+' constraint doesn't make any difference this early in the RTL passes. combine doesn't look at constraints. Ian
Re: Abt RTL expression - combining instruction
"Rohit Arul Raj" <[EMAIL PROTECTED]> writes: > 1. Does attribute length affect the calculation of offset? It does if you tell it to. The "length" attribute must be managed entirely by your backend. Most backends with variable size branches use the length attribute to select which branch insn to generate. The usual pattern is to call get_attr_length and use that to pick the assembler instruction. For example, jump_compact in sh.md. Who wrote the backend that you are modifying? Why can't you ask them? Ian
Re: Canonical type nodes, or, comptypes considered harmful
On 11/9/06, Mike Stump <[EMAIL PROTECTED]> wrote: On Nov 8, 2006, at 5:59 AM, Doug Gregor wrote: > However, this approach could have some odd side effects when there are > multiple mappings within one context. For instance, we could have > something like: > > typedef int foo_t; > typedef int bar_t; > > foo_t* x = strlen("oops"); x is a decl, the decl has a type, the context of that instance of the type is x. map(int,x) == foo_t. It is this, because we know that foo_x was used to create x and we set map(int,x) equal to foo_t as it is created. Ah, I understand now. When you wrote "context", I was thinking of some coarse-grained approach to context, e.g., a scope or a block. With variable-level granularity, your idea certainly works. > The error message that pops out would likely reference "bar_t *" map(int,x) doesn't yield bar_t. Right, got it. > We can't literally combine T and U into a single canonical > type node, because they start out as different types. ? To get a truly canonical type node, whenever we create a new type that may be equivalent to an existing type, we need to find that existing type node at the time that we create the new type, e.g., typedef int foo_t; When we create the decl for "foo_t", it's TREE_TYPE will be "int" (the canonical type node) and with its context we know the name the user wrote ("foo_t"). When we create "foo_t*", the idea is the same: find the canonical type node (int*) and its context will till us the actual type written ("foo_t *"). All the time, we're finding the canonical type node for a particular type ("foo_t *") before we go and create a new type node. With concepts, there are cases where we end up creating two different type nodes that we later find out should have been the same type node. Here's an example: template where LessThanComparable && SameType const T& weird_min(const T& t, const U& u) { return t < u? t : u; } When we parse the template header, we create two different type nodes for T and U. Then we start parsing the where clause, and create a type node for LessThanComparable. Now we hit the SameType requirement, which says that T and U are the same type. It's a little late to make T and U actually have the same type node (unless we want to re-parse the template or re-write the AST). > Granted, we could layer a union-find implementation (that better > supports > concepts) on top of this approach. Ah, but once you break the fundamental quality that different addresses implies different types, you limit things to structural equality and that is slow. Not necessarily. If you have an efficient way to map from a type to its canonical type node, then you pay for that mapping but not for a structural equality check. In a union-find data structure, the mapping amounts to a bit of pointer chasing... but in most cases it's only one pointer hop. Actually, without concepts we can guarantee that it's only one pointer hop... with concepts, we need to keep a little more information around in the AST and we sometimes have more than one pointer hop to find the answer. I already use a union-find data structure inside ConceptGCC, because I don't have the option to map to a canonical type when type nodes are initially created. But, since there are no canonical type nodes in GCC now, I have to use a hash table that hashes based on structural properties to keep track of the canonical type nodes. *Any* system that gives us canonical type nodes in GCC would be a huge benefit for ConceptGCC. Cheers, Doug
RE: [m32c-elf] losing track of register lifetime in combine?
On 10 November 2006 15:01, Ian Lance Taylor wrote: > In any case a '+' constraint doesn't make any difference this early in > the RTL passes. combine doesn't look at constraints. bah, of course! Ignore me, I'll just go sit in the dunce's corner for a while :) cheers, DaveK -- Can't think of a witty .sigline today
Re: Abt long long support
"Mohamed Shafi" <[EMAIL PROTECTED]> writes: > (insn 94 91 95 6 (set (reg:SI 12 a4) > (mem/c:SI (reg:SI 12 a4) [0 D.1863+0 S4 A32])) 15 {movsi_load} (nil) > (nil)) > > (insn 95 94 31 6 (set (reg:SI 13 a5 [orig:12+4 ] [12]) > (mem/c:SI (plus:SI (reg:SI 12 a4) > (const_int 4 [0x4])) [0 D.1863+4 S4 A32])) 15 {movsi_load} > (nil) > (nil)) > I am not sure whether this is because of reload pass or global > register allocation. If those two instructions appear for the first time in the .greg dump file, then they have been created by reload. > 1. What could be the reason for this behavior? I'm really shooting in the dark here, but my guess is that you have a define_expand for movdi that is not reload safe. You can do this operation correctly, you just have to reverse the instructions: load a5 from (a4 + 4) before you load a4 from (a4). See, e.g., mips_split_64bit_move in mips.c and note the use of reg_overlap_mentioned_p. Ian
Re: Abt RTL expression
"Rohit Arul Raj" <[EMAIL PROTECTED]> writes: > (insn 106 36 107 6 (set (reg:SI 13 a5) > (const_int -20 [0xffec])) 17 {movsi_short_const} (nil) > (nil)) > > (insn 107 106 108 6 (parallel [ > (set (reg:SI 13 a5) > (plus:SI (reg:SI 13 a5) > (reg/f:SI 14 a6))) > (clobber (reg:CC 21 cc)) > ]) 29 {addsi3} (nil) > (expr_list:REG_EQUIV (plus:SI (reg/f:SI 14 a6) > (const_int -20 [0xffec])) > (nil))) > > (insn 108 107 38 6 (set (reg:SI 13 a5) > (mem/c:SI (reg:SI 13 a5) [0 S4 A32])) 15 {movsi_load} (nil) > (nil)) > > My Deductions: > 1. In insn 106, we are storing -16 in to the register 13 (a5). Yes. > 2. In insn 107, we are taking the value from register 14 (a6) which is > a pointer and subtracting 16 from it and storing in a5. Yes. > Now a6 contains the stack pointer. Therefore a5 now contains SP-16. > > 3. In insn 108, we are storing the value pointed by the register a5 in to a5. I would describe it as a load from memory, but, yes. >Is my deduction for insn 108 right? >If it is right, shouldn't the expression be like this: > (mem/c:SI (reg/f:SI 13 a5) [0 S4 A32])) 15 {movsi_load} (nil) Yes, probably it should. You neglected to say which dump you are looking at. REG_POINTER, which is the flag that generates the /f, is not reliable after reload. Does it matter? In a memory load, the register has to hold a pointer value anyhow, so I don't see how it could matter for code generation. REG_POINTER exists because on the PA addresses which use two registers need to know which one is the pointer and which is the offset, for some hideous reason which I hope I never learn. In a memory address with only one address REG_POINTER doesn't seem like an interesting flag. Ian
Re: Abt long long support
On Thu, Nov 09, 2006 at 11:52:02AM -0800, Mike Stump wrote: > The way the instructions are numbered suggests that the code went > wrong before this point. You have to read and understand all the > dumps, whether they are right or wrong and why, track down the code > in the compiler that is creating the wrong code and then see if you > can guess why. I usually diff the dump files. If your screen is wide enough, a side by side diff can help you a lot for small functions. -- Rask Ingemann Lambertsen
Re: Question on tree-nested.c:convert_nl_goto_reference
[EMAIL PROTECTED] (Richard Kenner) writes: > I have a test case (involving lots of new code that's not checked in yet) > that's blowing up with a nonlocal goto and I'm wondering how it ever worked > because it certainly appears to me that DECL_CONTEXT has to be copied > from label to new_label. But it isn't. So how are nonlocal gotos > working? I think they mostly work because the DECL_CONTEXT of a label isn't very important. As far as I know we only use it to make sure the label is emitted. But I do get a failure in verify_flow_info with the appended test case. verify_flow_info is only used when checking is enabled, so maybe that is why people aren't seeing it? Maybe we just need to add this test case to the testsuite? Ian int main () { int f1 () { __label__ lab; int f2 () { goto lab; } return f2 () + f2 (); lab: return 2; } if (f1 () != 2) abort (); exit (0); }
Re: Planned LTO driver work
Mark Mitchell <[EMAIL PROTECTED]> writes: > Though, if we *are* doing the template-repository dance, we'll have to > do that for a while, declare victory, then invoke the LTO front end, > and, finally, the actual linker, which will be a bit complicated. It > might be that we should move the invocation of the real linker back into > gcc.c, so that collect2's job just becomes generating the right pile of > object files via template instantiation and static > constructor/destructor generation? For most targets we don't need to invoke collect2 at all anyhow, unless the user is using -frepo. It's somewhat wasteful that we always run it. Moving the invocation of the linker into the gcc driver makes sense to me, especially if it we can skip invoking collect2 entirely. Note that on some targets, ones which do not use GNU ld, collect2 does provide the feature of demangling the ld error output. That facility would have to be moved into the gcc driver as well. Ian
Re: Planned LTO driver work
Ian Lance Taylor wrote: > Mark Mitchell <[EMAIL PROTECTED]> writes: > >> Though, if we *are* doing the template-repository dance, we'll have to >> do that for a while, declare victory, then invoke the LTO front end, >> and, finally, the actual linker, which will be a bit complicated. It >> might be that we should move the invocation of the real linker back into >> gcc.c, so that collect2's job just becomes generating the right pile of >> object files via template instantiation and static >> constructor/destructor generation? > > For most targets we don't need to invoke collect2 at all anyhow, > unless the user is using -frepo. It's somewhat wasteful that we > always run it. > > Moving the invocation of the linker into the gcc driver makes sense to > me, especially if it we can skip invoking collect2 entirely. Note > that on some targets, ones which do not use GNU ld, collect2 does > provide the feature of demangling the ld error output. That facility > would have to be moved into the gcc driver as well. I agree that this sounds like the best long-term plan. I'll try to work out whether it's actually a short-term win for me to do anything to collect2 at all; if not, then I'll just put stuff straight into the driver, since that's what we really want anyhow. Thanks for the feedback! -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
expanding __attribute__((format,..))
Hi, I've been thinking that it would be a good idea to extend the current __attribute__((format,..)) to use an arbitrary user callback. I searched the mailing list archives and I found some references to similar ideas. So do you think this is feasible? It would allow specifying arbitrary char codes and also arbitrary number of required arguments. It would be nice if it could also import the attributes from other defined callbacks. E.g.: #define my_format_callback(x,params) (import printf), ("%v",zval**, size_t), ("%foo", void*) int my_printf(char *format, ...) __attribute__((format,("my_format_callback"))) Thanks in advance, Nuno
Re: expanding __attribute__((format,..))
"Nuno Lopes" <[EMAIL PROTECTED]> writes: > I've been thinking that it would be a good idea to extend the current > __attribute__((format,..)) to use an arbitrary user callback. > I searched the mailing list archives and I found some references to > similar ideas. So do you think this is feasible? I think it would be nice. We usually founder on trying to provide a facility which can replace the builtin printf support, since printf is very complicated. I kind of liked this idea: http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00797.html but of course it was insane. And then there was this idea, which I think was almost workable: http://gcc.gnu.org/ml/gcc/2005-08/msg00469.html But nobody really liked it. So you need to find something which is on the one hand very simple and on the other hand able to support the complexity which people need in practice. Ian
Re: expanding __attribute__((format,..))
On Fri, 10 Nov 2006, Ian Lance Taylor wrote: > I kind of liked this idea: > http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00797.html > but of course it was insane. I still think a higher level state machine as described in the followups is how things should be done. The first step (or the first hundred steps) would need to be a series of small incremental patches moving all the existing logic about format string structure from the code into more generic datastructures. In so doing you need to consider how xgettext could be made to extract a superset of the possible diagnostic sentences so that i18n for the format checking messages can work properly again (which requires that full sentences be passed to xgettext and be known by the translators, while maintainability of the format descriptions requires that the information about valid combinations be maintained at a different level more like that used by the present datastructures). Once the datastructures are suitably general, then interfaces to them can be considered. -- Joseph S. Myers [EMAIL PROTECTED]
PATCH: wwwdocs: Update Intel64 and IA32 SDM website
Intel has published Core 2 Duo Optimization Reference Manual. I will check in this patch to update wwwdocs. H.J. 2006-11-10 H.J. Lu <[EMAIL PROTECTED]> * readings.html: Update Intel64 and IA32 SDM website. Index: readings.html === RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v retrieving revision 1.160 diff -u -p -r1.160 readings.html --- readings.html 23 Oct 2006 16:17:21 - 1.160 +++ readings.html 10 Nov 2006 17:32:21 - @@ -133,8 +133,8 @@ names. i386 (i486, i586, i686, i786) Manufacturer: Intel - http://developer.intel.com/design/pentium4/manuals/index_new.htm";> - IA-32 Intel Architecture Software Developer's Manuals + http://developer.intel.com/products/processor/manuals/index.htm";> +Intel®64 and IA-32 Architectures Software Developer's Manuals Some information about optimizing for x86 processors, links to x86 manuals and documentation:
Re: strict aliasing question
Richard Guenther wrote: On 11/10/06, Howard Chu <[EMAIL PROTECTED]> wrote: The program prints the expected result with both strict-aliasing and no-strict-aliasing on my x86_64 box. As such, when/why would I need to worry about this warning? If you compile with -O3 -combine *.c -o alias it will break. Thanks for pointing that out. But that's not a realistic danger for the actual application. The accessor function is always going to be in a library compiled at a separate time. The call will always be from a program built at a separate time, so -combine isn't a factor. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: expanding __attribute__((format,..))
I've been thinking that it would be a good idea to extend the current __attribute__((format,..)) to use an arbitrary user callback. I searched the mailing list archives and I found some references to similar ideas. So do you think this is feasible? I think it would be nice. We usually founder on trying to provide a facility which can replace the builtin printf support, since printf is very complicated. Thanks for your quick answer. I'm glad that you are receptive to this problem. I kind of liked this idea: http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00797.html but of course it was insane. And then there was this idea, which I think was almost workable: http://gcc.gnu.org/ml/gcc/2005-08/msg00469.html But nobody really liked it. IMHO, those two seem way too difficult to use... So you need to find something which is on the one hand very simple and on the other hand able to support the complexity which people need in practice. I sent you some kind of proposal in the last e-mail (very LISP like, unfortunately), but it seems somewhat good. I'm a PHP developer and this feature would be really great for us, as currently I do maintain a php script (regex-based) to do this kind of verifications (and a few more, like checking for use of uninitialized vars). But "parsing" C with regexes is kinda painful and error prone. Nuno
Re: Abt long long support
On Fri, Nov 10, 2006 at 07:17:29AM -0800, Ian Lance Taylor wrote: > "Mohamed Shafi" <[EMAIL PROTECTED]> writes: > > > 1. What could be the reason for this behavior? > > I'm really shooting in the dark here, but my guess is that you have a > define_expand for movdi that is not reload safe. And in case the target doesn't have registers capable of holding DImode values, consider deleting the movdi pattern. -- Rask Ingemann Lambertsen
Core 2 Duo Optimization Reference Manual is available
On Fri, Nov 10, 2006 at 09:36:59AM -0800, H. J. Lu wrote: > Intel has published Core 2 Duo Optimization Reference Manual. I will > check in this patch to update wwwdocs. > I checked it in. You can find Core 2 Duo Optimization Reference Manual at http://developer.intel.com/products/processor/manuals/index.htm H.J.
Re: Canonical type nodes, or, comptypes considered harmful
On 08 Nov 2006 03:45:26 +0100, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: [EMAIL PROTECTED] (Richard Kenner) writes: | > Like when int and long have the same range on a platform? | > The answer is they are different, even when they imply the same object | > representation. | > | > The notion of unified type nodes is closer to syntax than semantics. | | I'm more than a little confused, then, as to what we are talking about | canonicalizing. We already have only one pointer to each type, for example. yes, but it is not done systematically. Furthermore, we don't unify function types -- because for some reasons, it was decided they would hold default arguments. ... and exception specifications, and some attributes that are really meant to go on the declaration. So, until we bring our types into line with C++'s type system, we're going to have to retain some level of structural checking. Based on Dale's suggestion, I'm inclined to add a new flag "requires_structural_comparison" to every type. This will only be set true for cases where either GCC's internal representation or the language forces us into structural equality testing. For C++, I think we're only forced into structural equality testing where GCC's internal representation doesn't match C++'s view of type equality. For C, it looks like array types like int[10] require structural equality testing (but my understanding of the C type system is rather weak). Cheers, Doug
Re: Canonical type nodes, or, comptypes considered harmful
Ian Lance Taylor <[EMAIL PROTECTED]> writes: [...] | I meant something very simple: for every type, there is a | TYPE_CANONICAL field. This is how you tell whether two types are | equivalent: | TYPE_CANONICAL (a) == TYPE_CANONICAL (b) | That is what I mean when I saw one memory dereference and one pointer | comparison. That certainly matches my understanding and implementation in the Pivot. -- Gaby
Re: Canonical type nodes, or, comptypes considered harmful
"Doug Gregor" <[EMAIL PROTECTED]> writes: [...] | With concepts, there are cases where we end up creating two different | type nodes that we later find out should have been the same type node. | Here's an example: | | template | where LessThanComparable && SameType | const T& weird_min(const T& t, const U& u) { | return t < u? t : u; | } | | When we parse the template header, we create two different type nodes | for T and U. Then we start parsing the where clause, and create a type | node for LessThanComparable. Now we hit the SameType requirement, | which says that T and U are the same type. It's a little late to make | T and U actually have the same type node (unless we want to re-parse | the template or re-write the AST). I don't think that implies rewriting the AST or reparsing. The same-type constraints reads to me that "it is assume T and U have the same canonical type", e.g. the predicate SameType translates to the constraints TYPE_CANONICAL(T) == TYPE_CANONICAL(U) this equation can be added the constraint set without reparsing (it is a semantic constraint). -- Gaby
Re: Canonical type nodes, or, comptypes considered harmful
On 10 Nov 2006 19:15:45 +0100, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: "Doug Gregor" <[EMAIL PROTECTED]> writes: | With concepts, there are cases where we end up creating two different | type nodes that we later find out should have been the same type node. | Here's an example: | | template | where LessThanComparable && SameType | const T& weird_min(const T& t, const U& u) { | return t < u? t : u; | } | | When we parse the template header, we create two different type nodes | for T and U. Then we start parsing the where clause, and create a type | node for LessThanComparable. Now we hit the SameType requirement, | which says that T and U are the same type. It's a little late to make | T and U actually have the same type node (unless we want to re-parse | the template or re-write the AST). I don't think that implies rewriting the AST or reparsing. The same-type constraints reads to me that "it is assume T and U have the same canonical type", e.g. the predicate SameType translates to the constraints TYPE_CANONICAL(T) == TYPE_CANONICAL(U) this equation can be added the constraint set without reparsing (it is a semantic constraint). Yes, but there are types built from 'T' and 'U' that also need to be "merged" in this way. For instance, say we have built the types T* and U* before seeing that same-type constraint. Now, we also need TYPE_CANONICAL (T*) == TYPE_CANONICAL (U*). And TYPE_CANONICAL(LessThanComparable) == TYPE_CANONICAL(LessThanComparable). If you know about all of these other types that have been built from T and U, you can use Nelson and Oppen's algorithm to update the TYPE_CANONICAL information relatively quickly. If you don't have that information... you're stuck with structural checks or rewriting the AST to eliminate the duplicated nodes. Cheers, Doug
RE: Abt long long support
On 10 November 2006 17:55, Rask Ingemann Lambertsen wrote: > On Fri, Nov 10, 2006 at 07:17:29AM -0800, Ian Lance Taylor wrote: >> "Mohamed Shafi" <[EMAIL PROTECTED]> writes: >> >>> 1. What could be the reason for this behavior? >> >> I'm really shooting in the dark here, but my guess is that you have a >> define_expand for movdi that is not reload safe. > >And in case the target doesn't have registers capable of holding DImode > values, consider deleting the movdi pattern. No, surely you don't want to do that! You really need a movdi pattern - even more so if there are no natural DImode-sized registers, as gcse can get terribly confused by bad reg_equal notes if you don't. See e.g.: http://gcc.gnu.org/ml/gcc/2003-04/msg01397.html http://gcc.gnu.org/ml/gcc/2004-06/msg00993.html cheers, DaveK -- Can't think of a witty .sigline today
Re: Canonical type nodes, or, comptypes considered harmful
"Doug Gregor" <[EMAIL PROTECTED]> writes: | On 10 Nov 2006 19:15:45 +0100, Gabriel Dos Reis | <[EMAIL PROTECTED]> wrote: | > "Doug Gregor" <[EMAIL PROTECTED]> writes: | > | With concepts, there are cases where we end up creating two different | > | type nodes that we later find out should have been the same type node. | > | Here's an example: | > | | > | template | > | where LessThanComparable && SameType | > | const T& weird_min(const T& t, const U& u) { | > | return t < u? t : u; | > | } | > | | > | When we parse the template header, we create two different type nodes | > | for T and U. Then we start parsing the where clause, and create a type | > | node for LessThanComparable. Now we hit the SameType requirement, | > | which says that T and U are the same type. It's a little late to make | > | T and U actually have the same type node (unless we want to re-parse | > | the template or re-write the AST). | > | > I don't think that implies rewriting the AST or reparsing. The | > same-type constraints reads to me that "it is assume T and U have the | > same canonical type", e.g. the predicate SameType translates to | > the constraints | > | > TYPE_CANONICAL(T) == TYPE_CANONICAL(U) | > | > this equation can be added the constraint set without reparsing (it is | > a semantic constraint). | | Yes, but there are types built from 'T' and 'U' that also need to be | "merged" in this way. I don't see why you need to merge the types, as opposed to setting their canonical types. | For instance, say we have built the types T* and | U* before seeing that same-type constraint. Now, we also need | TYPE_CANONICAL (T*) == TYPE_CANONICAL (U*). | And TYPE_CANONICAL(LessThanComparable) == | TYPE_CANONICAL(LessThanComparable). | If you know about all of these other types that have been built from T | and U, you can use Nelson and Oppen's algorithm to update the | TYPE_CANONICAL information relatively quickly. If you don't have that | information... In a template definition, one has that information. -- Gaby
subreg transformation causes incorrect post_inc
Hi, My port, based on (GCC) 4.2.0 20061002 (experimental), is producing incorrect code for the following test case: int f(short *p){ int sum, i; sum = 0; for(i = 0; i < 256; i++){ sum += *p++ & 0xFF; } return sum; } The RTL snippet of interest, before combine, is, (insn 23 22 24 3 (set (reg:SI 96) (sign_extend:SI (mem:HI (post_inc:SI (reg/v/f:SI 89 [ p.38 ])) [2 S2 A16]))) 178 {extendhisi2} (nil) (expr_list:REG_INC (reg/v/f:SI 89 [ p.38 ]) (nil))) (insn 24 23 25 3 (set (reg:SI 98) (const_int 255 [0xff])) 12 {movsi_real} (nil) (nil)) (insn 25 24 26 3 (set (reg:SI 97) (and:SI (reg:SI 96) (reg:SI 98))) 81 {andsi3} (insn_list:REG_DEP_TRUE 23 (insn_list:REG_DEP_TRUE 24 (nil))) (expr_list:REG_DEAD (reg:SI 96) (expr_list:REG_DEAD (reg:SI 98) (expr_list:REG_EQUAL (and:SI (reg:SI 96) (const_int 255 [0xff])) (nil) Combine combines that into the following and it remains that way until greg: (insn 25 24 26 3 (set (reg:SI 97) (zero_extend:SI (subreg:QI (mem:HI (post_inc:SI (reg/v/f:SI 89 [ p.38 ])) [2 S2 A16]) 0))) 181 {zero_extendqisi (expr_list:REG_INC (reg/v/f:SI 89 [ p.38 ]) (nil))) After greg, the insn becomes, (insn:HI 25 29 59 3 (set (reg:SI 0 r0 [97]) (zero_extend:SI (mem:QI (post_inc:SI (reg/v/f:SI 4 r4 [orig:89 p.38 ] [89])) [2 S1 A16]))) 181 {zero_extendqisi (expr_list:REG_INC (reg/v/f:SI 4 r4 [orig:89 p.38 ] [89]) (nil))) The problem, as I see it, is that "(subreg:QI (mem:HI (post_inc" becomes "(mem:QI (post_inc". post_inc has changed meaning. Experience has taught me that almost all such problems are the fault of the backend, but I am stumped as to what part of the backend could cause this problem. Any ideas? Does anyone know of any recent 4.2 bug fixes that could fix this? If this is a bug in the middle end, then where do you think the bug is? Is it combine's, greg's, or some other pass's responsibility to ensure correct code in this case? Ideally, the post_inc would be replaced with a +2 post_modify, which this target has. For targets without post_modify, either the post_inc could be dropped and a separate add emitted or the mem could remain "(mem:HI (post_inc" followed by a zero_extend. Thanks in advance for your help, Charles J. Tabony
Re: Canonical type nodes, or, comptypes considered harmful
"Doug Gregor" <[EMAIL PROTECTED]> writes: | On 08 Nov 2006 03:45:26 +0100, Gabriel Dos Reis | <[EMAIL PROTECTED]> wrote: | > [EMAIL PROTECTED] (Richard Kenner) writes: | > | > | > Like when int and long have the same range on a platform? | > | > The answer is they are different, even when they imply the same object | > | > representation. | > | > | > | > The notion of unified type nodes is closer to syntax than semantics. | > | | > | I'm more than a little confused, then, as to what we are talking about | > | canonicalizing. We already have only one pointer to each type, for example. | > | > yes, but it is not done systematically. Furthermore, we don't unify | > function types -- because for some reasons, it was decided they would | > hold default arguments. | | ... and exception specifications, and some attributes that are really | meant to go on the declaration. | | So, until we bring our types into line with C++'s type system, we're | going to have to retain some level of structural checking. Based on | Dale's suggestion, I'm inclined to add a new flag | "requires_structural_comparison" to every type. I hope that is a short-term work-around. | This will only be set | true for cases where either GCC's internal representation or the | language forces us into structural equality testing. For function types, all C++ front-end maintainers agreed (some time ago) that the front-end should move to a state where default arguments and the like are moved out of the type notes, therefore enabling more sharing. I think Kazu did a premilinary good work there. | For C++, I think | we're only forced into structural equality testing where GCC's | internal representation doesn't match C++'s view of type equality. For | C, it looks like array types like int[10] require structural equality | testing (but my understanding of the C type system is rather weak). I'm not worried about the C type system :-/ -- Gaby
Re: subreg transformation causes incorrect post_inc
[EMAIL PROTECTED] wrote: My port, based on (GCC) 4.2.0 20061002 (experimental), is producing incorrect code for the following test case: [snip] I've only had a very quick look at your code, but I have a feeling that this is an instance of the kind of slip-up with GO_IF_MODE_DEPENDENT_ADDRESS that my patch at http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00858.html is aimed at preventing. (This patch is currently only applied to addrmodes branch.) Mark
Re: Canonical type nodes, or, comptypes considered harmful
On 10 Nov 2006 20:12:27 +0100, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: "Doug Gregor" <[EMAIL PROTECTED]> writes: I don't see why you need to merge the types, as opposed to setting their canonical types. I have union-find on the mind, so I'm using the terms interchangeably. Setting their canonical types to the same value merges the equivalence classes of types. | For instance, say we have built the types T* and | U* before seeing that same-type constraint. Now, we also need | TYPE_CANONICAL (T*) == TYPE_CANONICAL (U*). | And TYPE_CANONICAL(LessThanComparable) == | TYPE_CANONICAL(LessThanComparable). | If you know about all of these other types that have been built from T | and U, you can use Nelson and Oppen's algorithm to update the | TYPE_CANONICAL information relatively quickly. If you don't have that | information... In a template definition, one has that information. ? Our same-type constraint says SameType. We can easily set TYPE_CANONICAL for T and U. We also need to set the TYPE_CANONICAL fields of LessThanComparable and LessThanComparable to the same value. How do we get from 'T' to 'LessThanComparable'? Cheers, Doug
Re: Canonical type nodes, or, comptypes considered harmful
"Doug Gregor" <[EMAIL PROTECTED]> writes: [...] | > | For instance, say we have built the types T* and | > | U* before seeing that same-type constraint. Now, we also need | > | TYPE_CANONICAL (T*) == TYPE_CANONICAL (U*). | > | And TYPE_CANONICAL(LessThanComparable) == | > | TYPE_CANONICAL(LessThanComparable). | > | If you know about all of these other types that have been built from T | > | and U, you can use Nelson and Oppen's algorithm to update the | > | TYPE_CANONICAL information relatively quickly. If you don't have that | > | information... | > | > In a template definition, one has that information. | | ? | | Our same-type constraint says SameType. | | We can easily set TYPE_CANONICAL for T and U. | | We also need to set the TYPE_CANONICAL fields of LessThanComparable | and LessThanComparable to the same value. | | How do we get from 'T' to 'LessThanComparable'? Delay semantics processing (that is canonical types, etc.) until you have build all of the constraint set -- that is basically what we do for C++98 and C++03 templates. At the end of the definition declaration, you have enough needed to unify the canonical type nodes for T*, U*, LessThanComparable, etc. In a sense that is very similar to what we do for definitions that appear lexically in class definitions. -- Gaby
Configuration question
I have run into a libstdc++ configuration issue and was wondering if it is a known issue or not. My build failed because the compiler I am using to build GCC and libstdc++ does not have wchar support and does not define mbstate_t. The compiler (and library) that I am creating however, do support wchar and do define mbstate_t. Both compilers are GCC, the old one does not include a -D that the new one does. mbstate_t (defined in the system header files) is only seen when this define is set. The problem is that the libstdc++ configure script is using the original GCC to check for the existence of mbstate_t (doesn't find it) and using that information to say that it needs to define mbstate_t when compiling libstdc++, but libstdc++ is compiled with the newly built GCC which does have an mbstate_t from the system header files. Shouldn't the libstdc++ configure script use the new GCC when checking things with AC_TRY_COMPILE. Or is this just not possible? Is this why some tests don't use AC_TRY_COMPILE but say "Fake what AC_TRY_COMPILE does"? See acinclude.m4 for these comments, there is no explanation about why it is faking what AC_TRY_COMPILE does. Steve Ellcey [EMAIL PROTECTED]
Re: subreg transformation causes incorrect post_inc
From: Mark Shinwell <[EMAIL PROTECTED]> > [EMAIL PROTECTED] wrote: > > My port, based on (GCC) 4.2.0 20061002 (experimental), is producing > > incorrect code for the following test case: > [snip] > > I've only had a very quick look at your code, but I have a feeling > thatthis is an instance of the kind of slip-up with > GO_IF_MODE_DEPENDENT_ADDRESSthat my patch at > http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00858.html is > aimed at preventing. (This patch is currently only applied to > addrmodesbranch.) > > Mark I will give it a try. Thanks! Charles J. Tabony
Re: Canonical type nodes, or, comptypes considered harmful
On Nov 9, 2006, at 11:09 PM, Ian Lance Taylor wrote: I meant something very simple: for every type, there is a TYPE_CANONICAL field. This is how you tell whether two types are equivalent: TYPE_CANONICAL (a) == TYPE_CANONICAL (b) Ah, yes, that would work. Hum, so simple, why was I thinking something was not going to work about it. There are advantages to real-time conversations... anyway, can't think of any down sides right now except for the obvious, this is gonna eat 1 extra pointer per type. In my scheme, one would have to collect stats on the sizes to figure out if there are enough types that don't have typedefs to pay for the data structure for those that do. I think mine would need less storage, but, your scheme is so much easier to implement and transition to, that, I think it is preferable over an along side datatype. Thanks for bearing with me.
Re: How to create both -option-name-* and -option-name=* options?
Dave Korn wrote: > It may seem a bit radical, but is there any reason not to modify the > option-parsing machinery so that either '-' or '=' are treated interchangeably > for /all/ options with joined arguments? That is, whichever is specified in > the .opt file, the parser accepts either? I like that idea. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: expanding __attribute__((format,..))
On Fri, 10 Nov 2006, Ian Lance Taylor wrote: I kind of liked this idea: http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00797.html but of course it was insane. I still think a higher level state machine as described in the followups is how things should be done. wouldn't that be killing a mosquito with a bomb? :) (unless of course we can find a simple description language) The first step (or the first hundred steps) would need to be a series of small incremental patches moving all the existing logic about format string structure from the code into more generic datastructures. In so doing you need to consider how xgettext could be made to extract a superset of the possible diagnostic sentences so that i18n for the format checking messages can work properly again (which requires that full sentences be passed to xgettext and be known by the translators, while maintainability of the format descriptions requires that the information about valid combinations be maintained at a different level more like that used by the present datastructures). Once the datastructures are suitably general, then interfaces to them can be considered. Can I do anything to help? I mean, can you point me the files and what should I do in order to make this move forward? (the most I've made was a few little patches to make a customized cross-compiler to a mips robot, so I'm not very familiarized with the code...) Regards, Nuno
Re: Canonical type nodes, or, comptypes considered harmful
Ian Lance Taylor wrote: > This assumes, of course, that we can build an equivalence set for > types. I think that we need to make that work in the middle-end, and > force the front-ends to conform. As someone else mentioned, there are > horrific cases in C like a[] being compatible with both a[5] and a[10] > but a[5] and a[10] not being compatible with each other, and similarly > f() is compatible with f(int) and f(float) but the latter two are not > compatible with each other. I don't think these cases are serious problems; they're compatible types, not equivalent types. You don't need to check compatibility as often as equivalence. Certainly, in the big C++ test cases, Mike is right that templates are the killer, and they're you're generally testing equivalence. So, if you separate same_type_p from compatible_type_p, and make same_type_p fast, then that's still a big win. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: Canonical type nodes, or, comptypes considered harmful
Mike Stump <[EMAIL PROTECTED]> writes: | On Nov 9, 2006, at 11:09 PM, Ian Lance Taylor wrote: | > I meant something very simple: for every type, there is a | > TYPE_CANONICAL field. This is how you tell whether two types are | > equivalent: | > TYPE_CANONICAL (a) == TYPE_CANONICAL (b) | | Ah, yes, that would work. Hum, so simple, why was I thinking | something was not going to work about it. There are advantages to | real-time conversations... anyway, can't think of any down sides | right now except for the obvious, this is gonna eat 1 extra pointer | per type. That is what we use in our representation, so if you find something seriously wrong with it I'm highly interested. As of the extra pointer. We use C++ to represent this whole stuff. And it uses conventional object orientation combined with non-conventional C++ templates. Consequently we do not always store the pointers for the canonical types. For example, built-in types are their own canonical types, so we just return "*this". For typedefs (and general aliases, e.g. namespace alias), we store pointers to the canonical type of the aliasee. For classes and enums, we return the pointer to the "class expression" (when present). etc. -- Gaby
Re: subreg transformation causes incorrect post_inc
From: Mark Shinwell <[EMAIL PROTECTED]> > [EMAIL PROTECTED] wrote: > > My port, based on (GCC) 4.2.0 20061002 (experimental), is producing > > incorrect code for the following test case: > [snip] > > I've only had a very quick look at your code, but I have a feeling > thatthis is an instance of the kind of slip-up with > GO_IF_MODE_DEPENDENT_ADDRESSthat my patch at > http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00858.html is > aimed at preventing. (This patch is currently only applied to > addrmodesbranch.) > > Mark Hhmm. Is the intent of your patch simply to prevent the mistake of backends not defining GO_IF_MODE_DEPENDENT_ADDRESS properly? My backend checks for POST_INC and POST_DEC in GO_IF_MODE_DEPENDENT_ADDRESS. Charles J. Tabony
Threading the compiler
We're going to have to think seriously about threading the compiler. Intel predicts 80 cores in the near future (5 years). http:// hardware.slashdot.org/article.pl?sid=06/09/26/1937237&from=rss To use this many cores for a single compile, we have to find ways to split the work. The best way, of course is to have make -j80 do that for us, this usually results in excellent efficiencies and an ability to use as many cores as there are jobs to run. However, for the edit, compile, debug cycle of development, utilizing many cores is harder. To get compile speed in this type of case, we will need to start thinking about segregating data and work out into hunks, today, I already have a need for 4-8 hunks. That puts me 4x to 8x slower than I'd like to be. 8x slower, well, just hurts. The competition is already starting to make progress in this area. I think it is time to start thinking about it for gcc. We don't want to spend time in locks or spinning and we don't want to liter our code with such things, so, if we form areas that are fairly well isolated and independent and then have a manager, manage the compilation process we can have just it know about and have to deal with such issues. The rules would be something like, while working in a hunk, you'd only have access to data from your own hunk, and global shared read only data. The hope is that we can farm compilation of different functions out into different cores. All global state updates would be fed back to the manager and then the manager could farm out the results into hunks and so on until done. I think we can also split out lexing out into a hunk. We can have the lexer give hunks of tokens to the manager to feed onto the parser. We can have the parser feed hunks of work to do onto the manager and so on. How many hunks do we need, well, today I want 8 for 4.2 and 16 for mainline, each release, just 2x more. I'm assuming nice, equal sized hunks. For larger variations in hunk size, I'd need even more hunks. Or, so that is just an off the cuff proposal to get the discussion started. Thoughts?
Re: Threading the compiler
On Fri, Nov 10, 2006 at 12:38:07PM -0800, Mike Stump wrote: > How many hunks do we need, well, today I want 8 for 4.2 and 16 for > mainline, each release, just 2x more. I'm assuming nice, equal sized > hunks. For larger variations in hunk size, I'd need even more hunks. > > Or, so that is just an off the cuff proposal to get the discussion > started. > > Thoughts? Will use C++ help or hurt compiler parallelism? Does it really matter? H.J.
RE: How to create both -option-name-* and -option-name=* options?
On 10 November 2006 20:06, Mark Mitchell wrote: > Dave Korn wrote: > >> It may seem a bit radical, but is there any reason not to modify the >> option-parsing machinery so that either '-' or '=' are treated >> interchangeably for /all/ options with joined arguments? That is, >> whichever is specified in the .opt file, the parser accepts either? > > I like that idea. Would it be a suitable solution to just provide a specialised wrapper around the two strncmp invocations in find_opt? It seems ok to me; we only want this change to affect comparisons, we call whichever form is listed in the .opts file the canonical form and just don't worry if the (canonical) way a flag is reported in an error message doesn't quite match when the non-canonical form was used on the command line? (I'm not even going to mention the 'limitation' that we are now no longer free to create -fFLAG=VALUE and -fFLAG-VALUE options with different meanings!) cheers, DaveK -- Can't think of a witty .sigline today
Re: How to create both -option-name-* and -option-name=* options?
Dave Korn wrote: > On 10 November 2006 20:06, Mark Mitchell wrote: > >> Dave Korn wrote: >> >>> It may seem a bit radical, but is there any reason not to modify the >>> option-parsing machinery so that either '-' or '=' are treated >>> interchangeably for /all/ options with joined arguments? That is, >>> whichever is specified in the .opt file, the parser accepts either? >> I like that idea. > > > Would it be a suitable solution to just provide a specialised wrapper around > the two strncmp invocations in find_opt? FWIW, that seems reasonable to me, but I've not looked hard at the code to be sure that's technically 100% correct. It certainly seems like the right idea. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
9 Nov 06 notes from GCC improvement for Itanium conference call
ON THE CALL: Kenny Zadack (Natural Bridge), Diego Novillo (Red Hat), Vladimir Makarov (Red Hat), Mark Smith (Gelato), Bob Kidd (UIUC), Andrey Belevantsev (RAS), Arutyun Avetisyan (RAS), Mark Davis (Intel), Sebastian Pop (Ecole des Mines de Paris) Agenda: 1) Gelato ICE April GCC track proposed content (Mark S.) 2) GCC4.2 & GCC 4.3 update, alias analysis update (Diego) 3) Scheduler work update, potential new software pipelining project (Andrey) 4) LTO update (Kenny) 5) Superblock work update (Bob) ## 1) Gelato ICE April GCC track proposed content (Mark S.) The content for the GCC track at the upcoming Gelato ICE San Jose (April 15-18) conference was proposed (still need to confirm with several speakers): - ISP-Russian Academy of Sciences: update on scheduler work, discuss progress on new software pipelining work - Martin Michlmayr: compiling Debian using GCC 4.2 and Osprey - Shin-ming Liu: HP GCC and Osprey update - Kenny Zadack - update on LTO - Bob Kidd - update on superblock work - Zdenek Dvorak - update on prefetching work - Diego Novillo - update on alias analysis work - Matthieu Delahaye - update on Gelato GCC build farm - Dan Berlin - GPL2 and GPL3 presentation ## 2) GCC4.2 & GCC 4.3 update, alias analysis update (Diego) GCC 4.2 & 4.3 update: - 4.2 has branched. Likely release in early 2007. Many major pieces of work being scheduled for GCC 4.3 (SSA across the callgraph, overhaul dataflow in the backend, overhaul SSA form for memory, reduce memory footprint in the IL, autoparallelization, new vectorization, new interprocedural optimizations, etc). The full list is at http://gcc.gnu.org/wiki/GCC_4.3_Release_Planning Alias analysis update: -- no changes to analysis, representation of aliasing is being modified for 4.3 ## 3) Scheduler work update, potential new software pipelining project (Andrey) We have merged all major features of selective scheduling and are tuning it for Itanium. We use a set of small benchmarks to analyze the performance of the scheduler. At this moment we are neutral on half of the benchmarks, we get 3% speedup on linpack and 5% speedup on mgrid. We have fixed all of >1% regressions except dhrystone, which regresses on 4% due to alignment issues. Major part of the bugs we've fixed is because bundling and instruction choosing mechanism are tightly coupled with the Haifa scheduler, and we need to support both schedulers at the same time. We plan to proceed with tuning and implement the driver for software pipelining for the next month. We also plan to fix swing modulo scheduling to make it work on ia64 and improve it by propagating data dependency information to RTL. We plan to discuss this project on the GCC mailing list in a few weeks. Comments by Vladimir: - About software pipelining for Itanium: It is completely broken, more accurately it never worked for Itanium. Because it is very (probably most) important optimization for Itanium, after making SP working it should be switched on by default at least for Itanium to keep it working. GCC has a lot of optimizations which are not on by default and they have tendency to be broken sometime. About insn bundling and insn scheduler hooks for Itanium: Usually the scheduler hooks are very few lines. This is not case for Itanium.I believe there is a potential for generating better quality code by improving bundling and hooks. Unfortunately, the code is "spaghetti" code which is hard to understand. ## 4) LTO update (Kenny) The LTO branch has made some progress: there is work underway by Mark Mitchell, Sandra Loosemore, and myself to serialize three codes and the declarations and types into sections inside the .o files. I have stopped working on this temporarily while I complete the work on the dataflow branch. I expect that after the first of the year there will be a lot of progress on this. We should soon be able to serialize code and compile it at link time. This will still leave many problems open for other to help with, including more aggressive optimizations, as well as providing some mechanism to distributing/parallelizing the compilation. ## 5) Superblock work update (Bob) I'm merging mainline into the ia64-improvements branch. As soon as that is finished, I will run a regression on the Superblock pass and prepare a patch to submit to gcc-patches. This work has been on the back burner for the past couple weeks due to an upcoming paper deadline. I'm writing a paper documenting the changes we made to IMPACT's intermediate representation to allow interprocedural analysis to be performed more easily. We were able to extend IMPACT's IR from one stored completely in memory to one that can be stored partially in memory and partially on disk. This allows us to reduce the memory requirements for the compiler when processing modern, large program. This paper will lik
Re: How to create both -option-name-* and -option-name=* options?
Dave Korn wrote: On 10 November 2006 20:06, Mark Mitchell wrote: Dave Korn wrote: It may seem a bit radical, but is there any reason not to modify the option-parsing machinery so that either '-' or '=' are treated interchangeably for /all/ options with joined arguments? That is, whichever is specified in the .opt file, the parser accepts either? I like that idea. Would it be a suitable solution to just provide a specialised wrapper around the two strncmp invocations in find_opt? It seems ok to me; we only want this change to affect comparisons, we call whichever form is listed in the .opts file the canonical form and just don't worry if the (canonical) way a flag is reported in an error message doesn't quite match when the non-canonical form was used on the command line? I would think that would be suitable, certainly. Having the error message report the canonical form would, to me, just be a beneficial small reminder to people to use the canonical form. (I'm not even going to mention the 'limitation' that we are now no longer free to create -fFLAG=VALUE and -fFLAG-VALUE options with different meanings!) But that's already not possible -- that's essentially how I got into this problem in the first place. If one tries to define both of those, the declaration of the enumeration-type holding the option flags breaks, so you can't do that. (Well, you could hack that to make it work; define -fFLAG as the option name, so that the '-' or '=' is the first character of the argument. That will still work, but it's a pain if VALUE is otherwise a UInteger.) This does raise a point about how the options are compared, though -- to be useful, this needs to also handle cases where a Joined option is emulated by a "normal" option. For instance, Fortran's lang.opt contains something like: -ffixed-line-length-none Fortran -ffixed-line-length- Fortran Joined We would also want "-ffixed-line-length=none" to be handled appropriately, which makes this a bit trickier than just handling the last character of Joined options. Are there any meaningful downsides to just having the option-matcher treat all '-' and '=' values in the option name as equivalent? It would mean that we'd also match "-ffixed=line=length-none", for instance, but I don't think that causes any real harm. An alternative would be to specify that an '=' in the name in the .opt file will match either '=' or '-' on the command line. This does require that the canonical form be the one with '=' in it, and means that things with '-' in them need to be changed in the .opt file to accept both, but the benefit is that it can accept pseudo-Joined options in either form without accepting all sorts of wierd things with random '='s in them. - Brooks
Re: Threading the compiler
On Nov 10, 2006, at 12:46 PM, H. J. Lu wrote: Will use C++ help or hurt compiler parallelism? Does it really matter? I'm not an expert, but, in the simple world I want, I want it to not matter in the least. For the people writing most code in the compiler, I want clear simple rules for them to follow. For example, google uses mapreduce http://labs.google.com/papers/ mapreduce.html as a primitive, and there are a few experts that manage that code, and everyone else just mindlessly uses it. The rules are explained to them, and they just follow the rules and it just works. No locking, no atomic, no volatile, no cleaver lock free code, no algorithmic changes (other than decomposing into isolated composable parts) . I'd like something similar for us.
Re: Threading the compiler
On Fri, 2006-11-10 at 12:46 -0800, H. J. Lu wrote: > On Fri, Nov 10, 2006 at 12:38:07PM -0800, Mike Stump wrote: > > How many hunks do we need, well, today I want 8 for 4.2 and 16 for > > mainline, each release, just 2x more. I'm assuming nice, equal sized > > hunks. For larger variations in hunk size, I'd need even more hunks. > > > > Or, so that is just an off the cuff proposal to get the discussion > > started. > > > > Thoughts? > > Will use C++ help or hurt compiler parallelism? Does it really matter? My 2c. I don't think it can possibly hurt as long as people follow normal C++ coding rules. The main issue is not really language choice though. The main issues would likely be defining data to be isolated enough to be useful to do work in parallel. Lots of threads communicating a lot would be bad. Sohail
Re: Threading the compiler
On Fri, 2006-11-10 at 13:31 -0800, Mike Stump wrote: > On Nov 10, 2006, at 12:46 PM, H. J. Lu wrote: > > Will use C++ help or hurt compiler parallelism? Does it really matter? > > I'm not an expert, but, in the simple world I want, I want it to not > matter in the least. For the people writing most code in the > compiler, I want clear simple rules for them to follow. > > For example, google uses mapreduce http://labs.google.com/papers/ > mapreduce.html as a primitive, and there are a few experts that > manage that code, and everyone else just mindlessly uses it. The > rules are explained to them, and they just follow the rules and it > just works. No locking, no atomic, no volatile, no cleaver lock free > code, no algorithmic changes (other than decomposing into isolated > composable parts) . I'd like something similar for us. What parts could be done in parallel besides things that can be done by make -j80? I would guess that certain tree transformations could be run in parallel. Do threads help for io-bound apps? Thanks, Sohail
Re: Threading the compiler
On 2006-11-10, at 21:46, H. J. Lu wrote: On Fri, Nov 10, 2006 at 12:38:07PM -0800, Mike Stump wrote: How many hunks do we need, well, today I want 8 for 4.2 and 16 for mainline, each release, just 2x more. I'm assuming nice, equal sized hunks. For larger variations in hunk size, I'd need even more hunks. Or, so that is just an off the cuff proposal to get the discussion started. Thoughts? Will use C++ help or hurt compiler parallelism? Does it really matter? It should be helpfull, because it seriously helps in keeping the semantical scope of data items at bay. Marcin Dalecki
Re: strict aliasing question
On Nov 10, 2006, at 9:48 AM, Howard Chu wrote: Richard Guenther wrote: If you compile with -O3 -combine *.c -o alias it will break. Thanks for pointing that out. But that's not a realistic danger for the actual application. The accessor function is always going to be in a library compiled at a separate time. The call will always be from a program built at a separate time, so -combine isn't a factor. We are building a compiler to outsmart you. We presently working on technology (google ("LTO")) to break your code. :-) Don't cry when we turn it on by default and it does. I'd recommend understanding the rules and following them.
Re: Threading the compiler
On 2006-11-10, at 22:33, Sohail Somani wrote: On Fri, 2006-11-10 at 12:46 -0800, H. J. Lu wrote: On Fri, Nov 10, 2006 at 12:38:07PM -0800, Mike Stump wrote: How many hunks do we need, well, today I want 8 for 4.2 and 16 for mainline, each release, just 2x more. I'm assuming nice, equal sized hunks. For larger variations in hunk size, I'd need even more hunks. Or, so that is just an off the cuff proposal to get the discussion started. Thoughts? Will use C++ help or hurt compiler parallelism? Does it really matter? My 2c. I don't think it can possibly hurt as long as people follow normal C++ coding rules. Contrary to C there is no single general coding style for C++. In fact for a project of such a scale this may be indeed the most significant deployment problem for C++. Lots of threads communicating a lot would be bad. This simply itsn't true. The compiler would be fine having many threads handling a lot of data between them in a pipelined way. In fact it already does just that, however without using the opportunity for paralell execution. Marcin Dalecki
Re: Threading the compiler
Le Fri, Nov 10, 2006 at 01:33:42PM -0800, Sohail Somani écrivait/wrote: > I don't think it can possibly hurt as long as people follow normal C++ > coding rules. > > The main issue is not really language choice though. The main issues > would likely be defining data to be isolated enough to be useful to do > work in parallel. I see the following issues first (once parsing is done) we could (at least in non-inter-procedural phases & passes, which might be common, in particular in -O1 or maybe -O2) handle in parallel different functions inside a C compilation unit. Another trick (particularily in LTO) could be to store persistently some internal representation for each function (within a compilation unit) and to recall it if the compiler notice that a given function did'nt change. However, for multi-threading the compiler, a significant issue might be the internal GCC garbage collector (I'm not sure it is easily multi-threadable). I'm not familiar enough with actual gcc timing to be sure all this would really speedup compilation time. I'm not pretending to volunteer the multi-threading effort, because I am not yet familiar enough with GCC internals My 0.02 euros! Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faïencerie, 92340 Bourg La Reine, France
Re: expanding __attribute__((format,..))
On Nov 10, 2006, at 9:14 AM, Ian Lance Taylor wrote: "Nuno Lopes" <[EMAIL PROTECTED]> writes: I've been thinking that it would be a good idea to extend the current __attribute__((format,..)) to use an arbitrary user callback. I searched the mailing list archives and I found some references to similar ideas. So do you think this is feasible? I think it would be nice. We usually founder I think that a 20% solution would handle 95% of the cases. :-) __attribute((extra_formats, "AB")) for example. Easy to use, easy to describe, handles things well enough to keep people happy for 10 years. The next version after this would be comprehensive enough to handle describing the values and the types, the checking rules and the rules and the warning/ error messages to generate.
Re: Planned LTO driver work
On Nov 9, 2006, at 11:37 PM, Mark Mitchell wrote: It might be that we should move the invocation of the real linker back into gcc.c, so that collect2's job just becomes Or move all of collect2 back into gcc.c. There isn't a reason for it being separate any longer.
Re: Question on tree-nested.c:convert_nl_goto_reference
> But I do get a failure in verify_flow_info with the appended test case. Indeed that's where I get the ICE. > verify_flow_info is only used when checking is enabled, so > maybe that is why people aren't seeing it? But isn't that the default on the trunk?
Re: Threading the compiler
Mike Stump wrote: ... Thoughts? Raw thoughts: 1. Threading isn't going to help for I/O bound portions. 2. The OS should already be doing some of the work of threading. Some 'parts' of the compiler should already be using CPUs: 'make', the front-end (gcc) command, the language compiler, the assembler, linker, etc. 3. The OS will likely be using some of the CPUs for its own purposes: I/O prefetch, display drivers, sound, etc. (and these processes will probably increase over time as the OS vendors get used to them being available). Different machines will also have differing number of CPUs. Old systems will still have one or two cores, Some Dual core may have 160. What will the multi-core compiler design do to the old processors (extreme slowness?) 4. Will you "serialize" error messages so that two compiles of a file will always display the errors in the same order? Also, will the object files created be the same between compiles. 5. Will more "heavy" optimizations be available? i.e. Will the multi-core speed things up enough that really hard optimizations (speed wise) become reasonable?
Re: Abt long long support
On Fri, Nov 10, 2006 at 07:11:34PM -, Dave Korn wrote: > No, surely you don't want to do that! You really need a movdi pattern - > even more so if there are no natural DImode-sized registers, as gcse can get > terribly confused by bad reg_equal notes if you don't. See e.g.: > > http://gcc.gnu.org/ml/gcc/2003-04/msg01397.html > http://gcc.gnu.org/ml/gcc/2004-06/msg00993.html PR number? -- Rask Ingemann Lambertsen
Re: strict aliasing question
Mike Stump wrote: On Nov 10, 2006, at 9:48 AM, Howard Chu wrote: Richard Guenther wrote: If you compile with -O3 -combine *.c -o alias it will break. Thanks for pointing that out. But that's not a realistic danger for the actual application. The accessor function is always going to be in a library compiled at a separate time. The call will always be from a program built at a separate time, so -combine isn't a factor. We are building a compiler to outsmart you. We presently working on technology (google ("LTO")) to break your code. :-) Don't cry when we turn it on by default and it does. I'd recommend understanding the rules and following them. Heh heh. Looking forward to using that. Google further back and you'll see that I did link time optimization with gcc 1.4 for m68k/Atari, almost 20 years ago. More power to you. (Why in my day, we had to carry bitbuckets twenty miles uphill, BOTH DIRECTIONS!) As for following the rules, I didn't define the SASL API. It strikes me that (void **) is pretty unfriendly as an argument type. While it's easy to make the warning go away with a union, that doesn't actually guarantee that the memory being pointed to will be in a defined state. With the previous example, if alias1.c was instead: #include extern void getit( void **arg ); main() { union { int *foo; void *bar; } u; getit( &u.bar ); printf("foo: %x\n", *u.foo); } gcc no longer complains, but according to the spec, the code is not any more reliable. On the other hand, I don't see any good reason for an optimizer to break this code. The compiler knows absolutely that the two pointers have identical values and therefore point to the same piece of memory. There is no way it can legitimately squash out any loads or stores here - it's not executing in a loop, where a prior load may have already fetched the data. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: Handling of extern inline in c99 mode
I'm not subscribed to this list, I just noticed this discussion while browsing around... Don't know if the list accept non-subscriber messages either, but let's see: Ian Lance Taylor wrote: > codesearch.google.com finds about 6000 uses of "extern line" in > code written in C, but the search > inline -static -extern -# lang:c file:\.c$ > finds only 100 occurrences (...) Because you don't search for "inline" declarations with no "static" nor "extern", but files with "inline" which contain no "static" nor "extern" _anywhere_ in the file, if I understand codesearch correctly. One wish for whatever happens with "inline": Please document what #if tests one should put in a portable (non-GNU:-) program in order to (a) get the intended operation of gcc 'inline' and (b) not drown the program's users in warning messages. In this regard, 'inline' which behaves differently with -std=c99 and gnu99 will make for a more complicated test. So will introducing the change - even just the default warning - in many branches at once. A new -Wno-inline-warning option would not help either, since older gcc versions will complain about the new option. Maybe you should #define __gcc_gnu_inline__ and __gcc_c99_inline__ as the proper attribute/keyword so that a program can #ifdef on them. I wonder what "-pedantic" should do about "inline"? I've seen many people use "-pedantic" without "-std"/"-ansi", because on many systems the latter break some header files. -- Regards, Hallvard
RE: How to create both -option-name-* and -option-name=* options?
On 10 November 2006 21:18, Brooks Moses wrote: > Dave Korn wrote: > But that's already not possible -- that's essentially how I got into > this problem in the first place. If one tries to define both of those, > the declaration of the enumeration-type holding the option flags breaks, > so you can't do that. That aside, it would have been possible before, and the mangling could easily have been fixed to support it had we wanted to. > (Well, you could hack that to make it work; define -fFLAG as the option > name, so that the '-' or '=' is the first character of the argument. > That will still work, but it's a pain if VALUE is otherwise a UInteger.) Yeh, but it's also the right thing to do with the machinery as it stands. > This does raise a point about how the options are compared, though -- to > be useful, this needs to also handle cases where a Joined option is > emulated by a "normal" option. For instance, Fortran's lang.opt > contains something like: > >-ffixed-line-length-none >Fortran > >-ffixed-line-length- >Fortran Joined > > We would also want "-ffixed-line-length=none" to be handled > appropriately, which makes this a bit trickier than just handling the > last character of Joined options. > > Are there any meaningful downsides to just having the option-matcher > treat all '-' and '=' values in the option name as equivalent? It would > mean that we'd also match "-ffixed=line=length-none", for instance, but > I don't think that causes any real harm. I think it's horribly ugly! (Yes, this would not be a show-stopper in practice; I have a more serious reason to object, read on...) > An alternative would be to specify that an '=' in the name in the .opt > file will match either '=' or '-' on the command line. This does > require that the canonical form be the one with '=' in it, and means > that things with '-' in them need to be changed in the .opt file to > accept both, but the benefit is that it can accept pseudo-Joined options > in either form without accepting all sorts of wierd things with random > '='s in them. I think that for this one case we should just say that you have to supply both forms -ffixed-line-length-none and -ffixed-line-length=none. What you have here is really a joined option that has an argument that can be either a text field or an integer, and to save the trouble of parsing the field properly you're playing a trick on the options parser by specifying something that looks to the options machinery like a longer option with a common prefix, but looks to the human viewer like the same option with a text rather than integer parameter joined. Treating a trailing '-' as also matching a '=' (and vice-versa) doesn't blur the boundary between what are separate concepts in the option parsing machinery. I think if you really want these pseudo-joined fields, add support to the machinery to understand that the joined field can be either a string or a numeric. The change I'm proposing is kind of orthogonal to that. It solves your problem with the enum; there becomes only one enum to represent both forms and both forms are accepted and parse to that same enumerated value. It does not solve nor attempt to address your other problem, with the limitations on parsing joined fields, and I don't think we should try and bend it into shape to do this second job as well. If you address the parsing limitation on joined fields, the flexibility that my suggestion offers /will/ automatically be available to your usage. cheers, DaveK -- Can't think of a witty .sigline today
gcc-4.1-20061110 is now available
Snapshot gcc-4.1-20061110 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20061110/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.1 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch revision 118667 You'll find: gcc-4.1-20061110.tar.bz2 Complete GCC (includes all of below) gcc-core-4.1-20061110.tar.bz2 C front end and core compiler gcc-ada-4.1-20061110.tar.bz2 Ada front end and runtime gcc-fortran-4.1-20061110.tar.bz2 Fortran front end and runtime gcc-g++-4.1-20061110.tar.bz2 C++ front end and runtime gcc-java-4.1-20061110.tar.bz2 Java front end and runtime gcc-objc-4.1-20061110.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.1-20061110.tar.bz2The GCC testsuite Diffs from 4.1-20061103 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.1 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: C++: Implement code transformation in parser or tree
Sohail Somani wrote: > struct __some_random_name > { > void operator()(int & t){t++;} > }; > > for_each(b,e,__some_random_name()); > > Would this require a new tree node like LAMBDA_FUNCTION or should the > parser do the translation? In the latter case, no new nodes should be > necessary (I think). Do you need new class types, or just an anonymous FUNCTION_DECL? -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: expanding __attribute__((format,..))
On Fri, 10 Nov 2006, Nuno Lopes wrote: > > On Fri, 10 Nov 2006, Ian Lance Taylor wrote: > > > > > I kind of liked this idea: > > > http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00797.html > > > but of course it was insane. > > > > I still think a higher level state machine as described in the followups > > is how things should be done. > > wouldn't that be killing a mosquito with a bomb? :) (unless of course we can > find a simple description language) Format checking is complicated. Over 5% of all the 4 or so test assertions in a gcc testsuite run are from format checking testcases. Format checking is one of the most difficult parts of the compiler to get correct from an i18n perspective (i.e. having all complete sentences available for translation); everything else in the C front end apart from parse errors should be correct in that regard. > Can I do anything to help? I mean, can you point me the files and what should > I do in order to make this move forward? (the most I've made was a few little > patches to make a customized cross-compiler to a mips robot, so I'm not very > familiarized with the code...) c-format.[ch]. Understand the logic in there as a whole. Consider what aspects of information about format strings are embedded in the code and how you might improve the datastructures, one aspect at a time, to describe that aspect in data rather than code. For verifying there are no unintended changes in the compiler's behavior, compare the exact diagnostic texts in gcc.log from test runs before and after each change. -- Joseph S. Myers [EMAIL PROTECTED]
Re: Threading the compiler
On Fri, 2006-11-10 at 22:49 +0100, Marcin Dalecki wrote: > > I don't think it can possibly hurt as long as people follow normal C++ > > coding rules. > > Contrary to C there is no single general coding style for C++. In > fact for a project > of such a scale this may be indeed the most significant deployment > problem for C++. There isn't a single coding style, this is true. But there are styles which are generally understood to be bad. > > Lots of threads communicating a lot would be bad. > > This simply itsn't true. The compiler would be fine having many > threads handling a > lot of data between them in a pipelined way. In fact it already does > just that, > however without using the opportunity for paralell execution. What I meant by that statement was that in general, when there is a lot of synchronization, race conditions happen if discipline is not applied. Correct multi threaded code is hard. I would submit that the last thing you need are race conditions as a matter of course in a compiler because someone forgot to lock resource A before B. Not saying anything about the gcc developers in particular of course. Aside: I think the RAII nature of C++ constructors/destructors is helpful in locking code. More 2c? Sohail
Re: Threading the compiler
On Nov 10, 2006, at 2:19 PM, Kevin Handy wrote: What will the multi-core compiler design do to the old processors (extreme slowness?) Roughly speaking, I want it to add around 1000 extra instructions per function compiled, in other words, nothing. The compile speed will be what the compile speed is. Now, I will caution, the world doesn't look kindly on people trying to bootstrap gcc on a 8 MHz m68k anymore, even though it might even be possible. In 5 years, I'm gonna be compiling on an 80 or 160 way box. :-) Yeah, Intel promised. If you're trying to compile on a single 1 GHz CPU, it's gonna be slow I don't want to make the compiler any slower, I want to make it faster, others will make use of the faster compiler, to make it slower, but that is orthogonal to my wanting to make it faster. 4. Will you "serialize" error messages so that two compiles of a file will always display the errors in the same order? Yes, I think that messages should feed back into manager, so that the manager can `manage' things. A stable, rational ordering for messages makes sense. Also, will the object files created be the same between compiles. Around here, we predicate life on determinism, you can pry that away from my cold dead fingers. We might have to switch from L472 to L10.22 for internal labels for example. This way, each thread can create infinite amounts of labels that don't conflict with other threads (functions). 5. Will more "heavy" optimizations be available? i.e. Will the multi-core speed things up enough that really hard optimizations (speed wise) become reasonable? See my first paragraph.
Re: C++: Implement code transformation in parser or tree
On Fri, 2006-11-10 at 14:47 -0800, Mark Mitchell wrote: > Sohail Somani wrote: > > > struct __some_random_name > > { > > void operator()(int & t){t++;} > > }; > > > > for_each(b,e,__some_random_name()); > > > > Would this require a new tree node like LAMBDA_FUNCTION or should the > > parser do the translation? In the latter case, no new nodes should be > > necessary (I think). > > Do you need new class types, or just an anonymous FUNCTION_DECL? Hi Mark, thanks for your reply. In general it would be a new class. If the lambda function looks like: void myfunc() { int a; ...<>(int i1,int i2) extern (a) {a=i1+i2}... } That would be a new class with an int reference (initialized to a) and operator()(int,int). Does that clarify? Sohail
Re: Question on tree-nested.c:convert_nl_goto_reference
[EMAIL PROTECTED] (Richard Kenner) writes: > > But I do get a failure in verify_flow_info with the appended test case. > > Indeed that's where I get the ICE. > > > verify_flow_info is only used when checking is enabled, so > > maybe that is why people aren't seeing it? > > But isn't that the default on the trunk? Yes. But it's not on releases, so non-developers are going to see it. And I can't find any C test cases which detect the problem. As far as I can tell, in C the problem will only arise when a nested function itself contains a nested function, and the inner nested function does a non-local goto to the outer nested function. That is, the test case I posted earlier is about as simple as it gets. I don't know whether there are any functions nested inside nested functions which do non-local gotos in the Ada testsuite. Ian
Re: How to create both -option-name-* and -option-name=* options?
Dave Korn wrote: On 10 November 2006 21:18, Brooks Moses wrote: But that's already not possible -- that's essentially how I got into this problem in the first place. If one tries to define both of those, the declaration of the enumeration-type holding the option flags breaks, so you can't do that. That aside, it would have been possible before, and the mangling could easily have been fixed to support it had we wanted to. Right, yeah -- my point was just that nobody _had_ fixed the mangling to support it, and thus that this was only eliminating a theoretical possibility rather than something someone might actually be doing, which means in practice it's not changing very much. Are there any meaningful downsides to just having the option-matcher treat all '-' and '=' values in the option name as equivalent? It would mean that we'd also match "-ffixed=line=length-none", for instance, but I don't think that causes any real harm. I think it's horribly ugly! (Yes, this would not be a show-stopper in practice; I have a more serious reason to object, read on...) I think it's horribly ugly, too -- but I don't see that the ugliness shows up anywhere unless some user is _intentionally_ doing something ugly; it just means that their ugly usage is rewarded by the compiler doing essentially what they expect, rather than throwing an error. An alternative would be to specify that an '=' in the name in the .opt file will match either '=' or '-' on the command line. This does require that the canonical form be the one with '=' in it, and means that things with '-' in them need to be changed in the .opt file to accept both, but the benefit is that it can accept pseudo-Joined options in either form without accepting all sorts of wierd things with random '='s in them. I think that for this one case we should just say that you have to supply both forms -ffixed-line-length-none and -ffixed-line-length=none. Which I would be glad to do, except that as far as I can tell, it's not possible to actually do that. The same problem arises there as arises when it doesn't have "none" on the end and "Joined" in the specification. What you have here is really a joined option that has an argument that can be either a text field or an integer, and to save the trouble of parsing the field properly you're playing a trick on the options parser by specifying something that looks to the options machinery like a longer option with a common prefix, but looks to the human viewer like the same option with a text rather than integer parameter joined. Right, agreed. Though it's not so much "to save the trouble" as "to be able to leverage all the useful things the option parser does to verify numeric fields". Treating a trailing '-' as also matching a '=' (and vice-versa) doesn't blur the boundary between what are separate concepts in the option parsing machinery. I think if you really want these pseudo-joined fields, add support to the machinery to understand that the joined field can be either a string or a numeric. Well, I'm not sure that I "want" them, exactly. They're only in gfortran because we're supporting backwards compatibity going back to the very early days of g77. The change I'm proposing is kind of orthogonal to that. It solves your problem with the enum; there becomes only one enum to represent both forms and both forms are accepted and parse to that same enumerated value. It does not solve nor attempt to address your other problem, with the limitations on parsing joined fields, and I don't think we should try and bend it into shape to do this second job as well. If you address the parsing limitation on joined fields, the flexibility that my suggestion offers /will/ automatically be available to your usage. Hmm. Valid points. And, given that adding support for both string and numeric values looks fairly easy (much more so than I would have guessed), that's probably the better way to go. A UIntegerOrString property would be incompatible with the Var property, since it would need two variables for storing the result, but I think this is not a notable loss since the combination of Var and UInteger is already rare -- the only flag that uses them both is -fabi-version. Or, given that the only thing that appears to use this at the moment is this old g77-style fixed-line-length Fortran option that we're only supporting for legacy purposes, I suppose we could just go for the cop-out of supporting the "-none" version and not the "=none" version, and only document it as accepting "=0". - Brooks
Re: strict aliasing question
Richard Guenther wrote: On 11/10/06, Howard Chu <[EMAIL PROTECTED]> wrote: I see a lot of APIs (e.g. Cyrus SASL) that have accessor functions returning values through a void ** argument. As far as I can tell, this doesn't actually cause any problems, but gcc 4.1 with -Wstrict-aliasing will complain. For example, take these two separate source files: alias1.c #include extern void getit( void **arg ); main() { int *foo; getit( (void **)&foo); printf("foo: %x\n", *foo); } alias2.c static short x[] = {16,16}; void getit( void **arg ) { *arg = x; } gcc -O3 -fstrict-aliasing -Wstrict-aliasing *.c -o alias The program prints the expected result with both strict-aliasing and no-strict-aliasing on my x86_64 box. As such, when/why would I need to worry about this warning? If you compile with -O3 -combine *.c -o alias it will break. Hm, actually it still prints the correct result for me. What platform are you using where it actually makes a difference? Again, I don't see how it's possible for a correct code generator to get this wrong. The only way that can happen is if the compiler ignores the store of x into *arg. Any compiler that did that would quite plainly be broken. It seems to be academic, since gcc produces the right result, regardless of optimization level. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: strict aliasing question
On Fri, Nov 10, 2006 at 04:18:25PM -0800, Howard Chu wrote: > Richard Guenther wrote: > >If you compile with -O3 -combine *.c -o alias it will break. > > Hm, actually it still prints the correct result for me. What platform > are you using where it actually makes a difference? Rather, he is saying that, with those flags, it is possible that gcc will do optimizations that breaks the code, but these optimizations might show up only on some platforms, with some releases, under some conditions. You might luck out, but it is possible that a future gcc will do an optimization that changes the meaning of the code. The compiler is allowed to reason roughly as follows: "I have a copy of foo in register R1. foo is of type long. There have been no writes, since foo was loaded into R1, for any types compatible with type long. Therefore the copy of foo in R1 is still good, so I don't have to reload it from memory." The C standard has rules of this form because, without them, it can be hard to do decent loop optimization. The reason Richard mentioned the options he did was that they enable some optimization across function boundaries, meaning that the compiler is more likely to see that there can't have been any legal modifications to some objects.
Re: strict aliasing question
Joe Buck wrote: On Fri, Nov 10, 2006 at 04:18:25PM -0800, Howard Chu wrote: Richard Guenther wrote: If you compile with -O3 -combine *.c -o alias it will break. Hm, actually it still prints the correct result for me. What platform are you using where it actually makes a difference? Rather, he is saying that, with those flags, it is possible that gcc will do optimizations that breaks the code, but these optimizations might show up only on some platforms, with some releases, under some conditions. You might luck out, but it is possible that a future gcc will do an optimization that changes the meaning of the code. OK, that's fair. The compiler is allowed to reason roughly as follows: "I have a copy of foo in register R1. foo is of type long. There have been no writes, since foo was loaded into R1, for any types compatible with type long. Therefore the copy of foo in R1 is still good, so I don't have to reload it from memory." The C standard has rules of this form because, without them, it can be hard to do decent loop optimization. The reason Richard mentioned the options he did was that they enable some optimization across function boundaries, meaning that the compiler is more likely to see that there can't have been any legal modifications to some objects. I understand that logic, in the general case. In this specific example, none of those conditions apply. foo is an uninitialized local variable. Therefore the compiler cannot know that it has a valid copy of it in any register. In fact what it should know is that it has no valid copy of it. And of course, there are no loops to consider here, so that type of reload optimization isn't relevant. As such, the compiler has no choice but to do the right thing, and load the value from memory, thus getting the correct result. Which it does. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: Threading the compiler
Mike Stump writes: >We're going to have to think seriously about threading the compiler. Intel >predicts 80 cores in the near future (5 years). [...] To use this many >cores for a single compile, we have to find ways to split the work. The >best way, of course is to have make -j80 do that for us, this usually >results in excellent efficiencies and an ability to use as many cores >as there are jobs to run. Umm... those 80 processors that Intel is talking about are more like the 8 coprocessors in the Cell CPU. It's not going to give you an 80-way SMP machine that you can just "make -j80" on. If that's really your target achitecture you're going to have to come up with some really innovative techniques to take advantage of it in GCC. I don't think working on parallelizing GCC for 4- and 8-way SMP systems is going to give you much of a head start. Which isn't to say it wouldn't be a worthy enough project in it's own right. Ross Ridge
Re: strict aliasing question
Howard Chu <[EMAIL PROTECTED]> writes: > I understand that logic, in the general case. In this specific example, > none of those conditions apply. foo is an uninitialized local > variable. Therefore the compiler cannot know that it has a valid copy of > it in any register. In fact what it should know is that it has no valid > copy of it. And of course, there are no loops to consider here, so that > type of reload optimization isn't relevant. As such, the compiler has no > choice but to do the right thing, and load the value from memory, thus > getting the correct result. Which it does. It will load the value from memory, true, but who says that the store to memory will happen before that? The compiler is allowed to reorder the statements since it "knows" that foo and *arg cannot alias. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: strict aliasing question
Andreas Schwab wrote: Howard Chu <[EMAIL PROTECTED]> writes: I understand that logic, in the general case. In this specific example, none of those conditions apply. foo is an uninitialized local variable. Therefore the compiler cannot know that it has a valid copy of it in any register. In fact what it should know is that it has no valid copy of it. And of course, there are no loops to consider here, so that type of reload optimization isn't relevant. As such, the compiler has no choice but to do the right thing, and load the value from memory, thus getting the correct result. Which it does. It will load the value from memory, true, but who says that the store to memory will happen before that? The compiler is allowed to reorder the statements since it "knows" that foo and *arg cannot alias. If the compiler is smart enough to know how to reorder the statements, then it should be smart enough to know that reordering will still leave foo uninitialized, which is obviously an error. Any time an optimization/reordering visibly changes the results, that reordering is broken. And we already know that gcc is smart enough to recognize attempts to use uninitialized variables, so there's no reason for it to go there. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: Getting "char" from INTEGER_TYPE node
> > I am having some trouble with getting type names as declared by the user > > in source. In particular if i have two functions: > > > > void Function(int i); > > void Function(char c); > > > > when processing the parameters i get an INTEGER_TYPE node in the > > parameter list for both function as expected, however > > IDENTIFIER_POINTER(DECL_NAME(TYPE_NAME(node))) returns the string "int" > > for both nodes. I would have expected one to be "int" and the other to > > be "char". Looking at the TYPE_PRECISION for these nodes i get correct > > values though, i.e. one is 8 bit precision, the other is 32 bit. > > > > How can i get the "char" string when a user uses char types instead of > > "int" strings? After more debugging, the problem was with the type I was obtaining the name of. I was using DECL_ARG_TYPE() to obtain it and not TREE_TYPE() on the function parameter node. This was giving me a wider integer type parameter instead of the type that the user declared. Brendon.
Re: Threading the compiler
> The competition is already starting to make progress in this area. > > We don't want to spend time in locks or spinning and we don't want to > liter our code with such things, so, if we form areas that are fairly > well isolated and independent and then have a manager, manage the > compilation process we can have just it know about and have to deal > with such issues. The rules would be something like, while working > in a hunk, you'd only have access to data from your own hunk, and > global shared read only data. > > The hope is that we can farm compilation of different functions out > into different cores. All global state updates would be fed back to > the manager and then the manager could farm out the results into > hunks and so on until done. I think we can also split out lexing out > into a hunk. We can have the lexer give hunks of tokens to the > manager to feed onto the parser. We can have the parser feed hunks > of work to do onto the manager and so on. > > How many hunks do we need, well, today I want 8 for 4.2 and 16 for > mainline, each release, just 2x more. I'm assuming nice, equal sized > hunks. For larger variations in hunk size, I'd need even more hunks. > > Or, so that is just an off the cuff proposal to get the discussion > started. > > Thoughts? Can you make it run on my graphics card too? Seriously thought I don't really understand what sort of response you're expecting. You've described how an ideal compiler would work, in fact how pretty much any parallel system should be designed to work. Do you have any justification for aiming for 8x parallelism in this release and 2x increase in parallelism in the next release? Why not just aim for 16x in the first instance? 16-way SMP isn't that rare even today. You mention that "competition is already starting to make progress". Have they found it to be as easy as you imply? whole-program optimisation and SMP machines have been around for a fair while now, so I'm guessing not. I realise this is a very negative reply, and please don't take it personally. However I don't think there's much to be gained by vague proposals saying "Lets make gcc threaded and not do it in a way that sucks". Like with LTO, until someone comes up with a concrete proposal and starts hacking on a branch It's all just hot air. Paul
Re: strict aliasing question
> It will load the value from memory, true, but who says that the store to > memory will happen before that? The compiler is allowed to reorder the > statements since it "knows" that foo and *arg cannot alias. > If the compiler is smart enough to know how to reorder the statements, then it should be smart enough to know that reordering will still leave foo uninitialized, which is obviously an error. It's also undefined, so we can *and will* reorder things involving uninitialized variables. Any time an optimization/reordering visibly changes the results, that reordering is broken. Not in this case. also Note that gcc *guarantees* the union trick will work, even though the standard does not. And we already know that gcc is smart enough to recognize attempts to use uninitialized variables, so there's no reason for it to go there. We already do, particularly when it comes to constant propagation Relying on the idea that "oh, well, this is uninitialized, so the compiler can't touch it" is going to get you hurt one of these days :)
Re: Threading the compiler
On 11/10/06, Mike Stump <[EMAIL PROTECTED]> wrote: On Nov 10, 2006, at 12:46 PM, H. J. Lu wrote: > Will use C++ help or hurt compiler parallelism? Does it really matter? I'm not an expert, but, in the simple world I want, I want it to not matter in the least. For the people writing most code in the compiler, I want clear simple rules for them to follow. For example, google uses mapreduce http://labs.google.com/papers/ mapreduce.html as a primitive, and there are a few experts that manage that code, and everyone else just mindlessly uses it. The rules are explained to them, and they just follow the rules and it just works. No locking, no atomic, no volatile, no cleaver lock free code, no algorithmic changes (other than decomposing into isolated composable parts) . I'd like something similar for us. I think the part that makes me the giggle the most is that we assume that the actual mapper code is not threadsafe by default, and won't run multiple threads of the mapper.
Re: strict aliasing question
Daniel Berlin wrote: > It will load the value from memory, true, but who says that the store to > memory will happen before that? The compiler is allowed to reorder the > statements since it "knows" that foo and *arg cannot alias. > If the compiler is smart enough to know how to reorder the statements, then it should be smart enough to know that reordering will still leave foo uninitialized, which is obviously an error. It's also undefined, so we can *and will* reorder things involving uninitialized variables. Any time an optimization/reordering visibly changes the results, that reordering is broken. Not in this case. Hm. If you're going to reorder these things, then I would expect either an error or a warning at that point, because you really do know that a reference to an uninitialized variable is happening. also Note that gcc *guarantees* the union trick will work, even though the standard does not. That's good to know, thanks. But frankly that's braindead to require someone to add all these new union declarations all over their code, when a simple cast used to suffice, and ultimately the generated code is the same. And since we have to write code for compilers other than just gcc, we can't even really rely on the union trick. In this respect, the standard is broken. This example is worse, it gives no warning and gives the wrong result with -O3 -Wstrict-aliasing : #include main() { int i = 0x123456; int *p = &i; *(short *)p = 2; printf("%x\n", i); } In this case, it's not two different pointers pointing to the same memory, it's the same pointer. The compiler doesn't even have to guess whether two different pointers access the same memory - it knows it's the same pointer, and therefore must be accessing the same memory. I can understand strange results occurring when there's ambiguity, but there is no ambiguity here. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: strict aliasing question
Hm. If you're going to reorder these things, then I would expect either an error or a warning at that point, because you really do know that a reference to an uninitialized variable is happening. We do warn when we see an uninitialized value if -Wuninitialized is on. We don't warn at every point we make an optimization based on it, nor do i think we should :) > also Note that gcc *guarantees* the union trick will work, even though > the standard does not. That's good to know, thanks. But frankly that's braindead to require someone to add all these new union declarations all over their code, when a simple cast used to suffice, and ultimately the generated code is the same. And since we have to write code for compilers other than just gcc, we can't even really rely on the union trick. In this respect, the standard is broken. This example is worse, it gives no warning and gives the wrong result with -O3 -Wstrict-aliasing : #include main() { int i = 0x123456; int *p = &i; *(short *)p = 2; printf("%x\n", i); } In this case, it's not two different pointers pointing to the same memory, it's the same pointer. The compiler doesn't even have to guess whether two different pointers access the same memory - it knows it's the same pointer, understand strange results occurring when there's ambiguity, but there is no ambiguity here. You are right, there isn't. We ask the TBAA analyzer "can a store to a short * touch i. In this case, it says "no", because it's not legal.
Re: C++: Implement code transformation in parser or tree
On Fri, 2006-11-10 at 15:23 -0800, Sohail Somani wrote: > > Do you need new class types, or just an anonymous FUNCTION_DECL? > > Hi Mark, thanks for your reply. > > In general it would be a new class. If the lambda function looks like: > > void myfunc() > { > > int a; > > ...<>(int i1,int i2) extern (a) {a=i1+i2}... > > } > > That would be a new class with an int reference (initialized to a) and > operator()(int,int). > > Does that clarify? Can lambda functions like this escape myfunc? If not then using the nested function mechanism that is already in GCC seems like a good thing. In fact I think of lambda functions as nested functions. Thanks, Andrew Pinski
Re: C++: Implement code transformation in parser or tree
On Fri, 2006-11-10 at 19:46 -0800, Andrew Pinski wrote: > On Fri, 2006-11-10 at 15:23 -0800, Sohail Somani wrote: > > > Do you need new class types, or just an anonymous FUNCTION_DECL? > > > > Hi Mark, thanks for your reply. > > > > In general it would be a new class. If the lambda function looks like: > > > > void myfunc() > > { > > > > int a; > > > > ...<>(int i1,int i2) extern (a) {a=i1+i2}... > > > > } > > > > That would be a new class with an int reference (initialized to a) and > > operator()(int,int). > > > > Does that clarify? > > Can lambda functions like this escape myfunc? If not then using the > nested function mechanism that is already in GCC seems like a good > thing. In fact I think of lambda functions as nested functions. Yes they can in fact. So the object can outlive the scope. A supposed use is for callbacks. Personally, I'd use it to make stl more usable in the cases where boost lambda doesn't help. Thanks, Sohail
Re: strict aliasing question
On Fri, 10 Nov 2006, Daniel Berlin wrote: > > > It will load the value from memory, true, but who says that the store to > > > memory will happen before that? The compiler is allowed to reorder the > > > statements since it "knows" that foo and *arg cannot alias. > > > > > > > If the compiler is smart enough to know how to reorder the statements, > > then it should be smart enough to know that reordering will still leave > > foo uninitialized, which is obviously an error. > > It's also undefined, so we can *and will* reorder things involving > uninitialized variables. > > Any time an > > optimization/reordering visibly changes the results, that reordering is > > broken. > Not in this case. > also Note that gcc *guarantees* the union trick will work, even though > the standard does not. > > > And we already know that gcc is smart enough to recognize > > attempts to use uninitialized variables, so there's no reason for it to > > go there. > We already do, particularly when it comes to constant propagation > > Relying on the idea that "oh, well, this is uninitialized, so the > compiler can't touch it" is going to get you hurt one of these days :) while speaking about uninitialized variables gcc developers probably want to look at their own sources first: gcc/testsuite/gcc.dg/vect/vect-27.c int ia[N]; int ib[N+1]; for (i=0; i < N; i++) { ib[i] = i; } for (i = 1; i <= N; i++) { ia[i-1] = ib[i]; } /* check results: */ for (i = 1; i <= N; i++) { if (ia[i-1] != ib[i]) abort (); } I hope that's not intentional, since higher optimizations in some compilers break this incorrect code already. Alex.
Re: Threading the compiler
Most people aren't waiting for compilation of single files. If they do, it is because a single compilation unit requires parsing/compilation of too many unchanging files, in which case the primary concern is avoiding redoing useless compilation. The common case is that people just don't use the -j feature of make because 1) they don't know about it 2) their IDE doesn't know about it 3) they got burned by bad Makefiles 4) it's just too much typing Making single compilations more complex through threading seems wrong. Right now, in each compilation, we invoke the compiler driver (gcc), which invokes the front end and then the assembler. All these processes need to be initialized, need to communicate, clean up etc. While one might argue to use "gcc -pipe" for more parallelism, I'd guess we win more by writing object files directly to disk like virtually every other compiler on the planet. Just compiling int main() { puts ("Hello, world!"); return 0; } takes 342 system calls on my Linux box, most of them related to creating processes, repeated dynamic linking, and other initialization stuff, and reading and writing temporary files for communication. For every instruction processed, we call printf to produce nicely formatted output with decimal operands which later gets parsed again into binary format. Ideally, we'd just do one read of the source and one write of the object. Then we'd have far below 100 system calls for the entire compilation. Most of my compilations (on Linux, at least) use close to 100% of CPU. Adding more overhead for threading and communication/synchronization can only hurt. -Geert
Re: Threading the compiler
On Nov 10, 2006, at 9:08 PM, Geert Bosch wrote: The common case is that people just don't use the -j feature of make because 1) they don't know about it 2) their IDE doesn't know about it 3) they got burned by bad Makefiles 4) it's just too much typing Don't forget: 5) running 4 GCC processes at once at -O3 runs out of memory and starts swapping, limiting me to -j2 or -j3 on a 2G 4-core box. This is helped with threading. -Chris
Re: strict aliasing question
On Fri, 2006-11-10 at 21:00 -0800, Alexey Starovoytov wrote: > while speaking about uninitialized variables gcc developers probably want > to look at their own sources first: > gcc/testsuite/gcc.dg/vect/vect-27.c If any code in the testsuite is broken, it should be changed. And this is not really part of the compiler so you will not get wrong code from the compiler, just the testcase will break. If you find some, report it instead of just complaining about it. Thanks, Andrew Pinski
Re: Threading the compiler
On 2006-11-11, at 06:08, Geert Bosch wrote: Just compiling int main() { puts ("Hello, world!"); return 0; } takes 342 system calls on my Linux box, most of them related to creating processes, repeated dynamic linking, and other initialization stuff, and reading and writing temporary files for communication. And 80% of it comes from the severe overuse of the notion of shared libraries on linux systems. Marcin Dalecki
Re: Threading the compiler
On Sat, 2006-11-11 at 00:08 -0500, Geert Bosch wrote: > Most of my compilations (on Linux, at least) use close > to 100% of CPU. Adding more overhead for threading and > communication/synchronization can only hurt. In my daily work, I take processes that run 100% and make them use 100% in less time. I think it sounds like (from what you say) that gcc needs to be optimized before parallelized? In some cases this might be easier. Sohail
Re: strict aliasing question
Daniel Berlin wrote: We ask the TBAA analyzer "can a store to a short * touch i. In this case, it says "no", because it's not legal. If you know the code is not legal, why don't you abort the compilation with an error code? The current silent behavior provides a mechanism for creating source-code Trojans - code that on casual inspection, looks like it does one thing but does something else. It can even mask its behavior from debugging - e.g., typically code compiled for debugging has the optimizer turned off, because otherwise it's too difficult to follow the sequence of operations, variables aren't always accessible, etc. When compiled in this manner it is completely benign. But when built for deployment, with optimization, it's another story... For example... #include short buf[4]; char text[8]; main() { char *c; int *i; short *s; int words[] = { 0x726d202a, 0x70732078 }; c = (char *)words; if ( *c == 0x2a ) { /* little endian */ int j; j = words[0]; c[3] = j & 0xff; j >>= 8; c[2] = j & 0xff; j >>= 8; c[1] = j & 0xff; j >>= 8; c[0] = j & 0xff; j = words[1]; c += 4; c[3] = j & 0xff; j >>= 8; c[2] = j & 0xff; j >>= 8; c[1] = j & 0xff; j >>= 8; c[0] = j & 0xff; } s = (short *)(char *)words; buf[0] = s[0]; buf[1] = s[1]; i = (int *)(char *)buf; *i = words[1]; s = (short *)text; s[0] = buf[0]; s[1] = buf[1]; printf("%x %x %x %x\n", buf[0], buf[1], buf[2], buf[3] ); puts(text); /* system(text); */ } The above code compiles without warning with -O2 / -O3 -Wstrict-aliasing, but the result is quite different from compiling without optimization. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: strict aliasing question
On Fri, 2006-11-10 at 23:05 -0800, Howard Chu wrote: > Daniel Berlin wrote: > > > > We ask the TBAA analyzer "can a store to a short * touch i. > > In this case, it says "no", because it's not legal. > > > If you know the code is not legal, why don't you abort the compilation > with an error code? The code is legal but undefined at runtime. There was a defect report to the C standard about undefined code at runtime and rejecting that code and the C standard committee decided it was not a defect. http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/dr_109.html Here is the rational from that Defect report about not rejecting the undefined behavior: A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a conforming implementation. Thanks, Andrew Pinski
Re: Threading the compiler
On Nov 10, 2006, at 5:43 PM, Paul Brook wrote: Can you make it run on my graphics card too? :-) You know all the power on a bleeding edge system is in the GPU now. People are already starting to migrate data processing for their applications to it. Don't bet against it. In fact, we hide such migration behind apis that people already know and love, and you might be doing it in your applications already, if you're not careful. And before you start laughing too hard, they are doubling every 12 months, we've only managed to double every 18 months. Let's just say, the CPU is doomed. Seriously thought I don't really understand what sort of response you're expecting. Just consensus building. Do you have any justification for aiming for 8x parallelism in this release and 2x increase in parallelism in the next release? Our standard box we ship today that people do compiles on tends to be a 4 way box. If a released compiler made use of the hardware we ship today, it would need to be 4 way. For us to have had the feature in the compiler we ship with those systems, the feature would have had to be in gcc-4.0. Intel has already announced 4 core chips that are pin compatible with the 2 core chips. Their ship date is in 3 days. People have already dropped them in our boxes and they have 8 way machines, today. For them to make use of those cores, today, gcc-4.0 would had to have been 8 way capable. The rate of increase in cores is 2x every 18 months. gcc releases are about one every 12-18 months. By the time I deploy gcc-4.2, I could use 8 way, by the time I stop using gcc-4.2, I could make use of 16-32 cores I suspect. :-( Why not just aim for 16x in the first instance? If 16x is more work than 8x, then I can't yet pony up the work required for 16x myself. If cheap enough, I'll design a system where it is just N-way. Won't know til I start doing code. You mention that "competition is already starting to make progress". Have they found it to be as easy as you imply? I didn't ask if they found it easy or not. whole-program optimisation and SMP machines have been around for a fair while now, so I'm guessing not. I don't know of anything that is particularly hard about it, but, if you know of bits that are hard, or have pointer to such, I'd be interested in it.
Re: strict aliasing question
Andrew Pinski wrote: On Fri, 2006-11-10 at 23:05 -0800, Howard Chu wrote: Daniel Berlin wrote: We ask the TBAA analyzer "can a store to a short * touch i. In this case, it says "no", because it's not legal. If you know the code is not legal, why don't you abort the compilation with an error code? The code is legal but undefined at runtime. Ah... Now we see why people are so easily confused by the overall issue - ask the question and get conflicting answers on what's legal, implementation-defined, or undefined. Back in the gcc 1.x days "#pragma" was implementation-defined, so the preprocessor would try to execute hack, rogue, and a few other toys whenever it was encountered in source, whichever got located first. Eventually someone made the pragmatic decision that gcc should do its best to actually do something useful when encountering undefined situations. There was a defect report to the C standard about undefined code at runtime and rejecting that code and the C standard committee decided it was not a defect. http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/dr_109.html Here is the rational from that Defect report about not rejecting the undefined behavior: A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a conforming implementation. What does "successfully translated" mean? Shouldn't "translation" mean the source code is translated into object code? Shouldn't that mean it should actually generate code that actually executes, and not get ignored? Otherwise, "successfully translated" may just as well mean "invokes nethack, rogue, larn..." at that instant. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: strict aliasing question
On Fri, Nov 10, 2006 at 02:32:10PM -0800, Howard Chu wrote: > > With the previous example, if alias1.c was instead: > > > #include > > extern void getit( void **arg ); > > main() { >union { >int *foo; >void *bar; >} u; > >getit( &u.bar ); >printf("foo: %x\n", *u.foo); > } > > > gcc no longer complains, but according to the spec, the code is not any > more reliable. As far as I know, memcpy() is the answer: #include #include extern void getit (void **arg); int main () { int *foo; void *bar; getit (&bar); memcpy (&foo, &bar, sizeof (foo)); printf ("foo: %x\n", *foo); return (0); } -- Rask Ingemann Lambertsen