gcc@gcc.gnu.org
Hi Mikhail, Thanks for the comments. I haven't updated my GDB yet and I will test it again once I have a newer version GDB. Yuhang On 06/06/2015 09:31 PM, Mikhail Maltsev wrote: On 07.06.2015 0:15, steven...@gmail.com wrote: Dear GCC developers, I have successfully compiled & installed GCC 4.9.2. Could you comment on the results of 'make check' (see below). Here is the relevant information: You can verify it against published test results: https://www.gnu.org/software/gcc/gcc-4.9/buildstat.html === gfortran tests === Running target unix FAIL: gfortran.dg/guality/pr41558.f90 -O2 line 7 s == 'foo' FAIL: gfortran.dg/guality/pr41558.f90 -O3 -fomit-frame-pointer line 7 s == 'foo' FAIL: gfortran.dg/guality/pr41558.f90 -O3 -fomit-frame-pointer -funroll-loops line 7 s == 'foo' FAIL: gfortran.dg/guality/pr41558.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions line 7 s == 'foo' FAIL: gfortran.dg/guality/pr41558.f90 -O3 -g line 7 s == 'foo' FAIL: gfortran.dg/guality/pr41558.f90 -Os line 7 s == 'foo' guality testsuite checks generated debug information. It's a functional test, i.e. it performs real GDB invocation, so the result might also depend on your version of GDB, it's settings, etc. There are similar issues on CentOS 6 in this test report https://gcc.gnu.org/ml/gcc-testresults/2015-03/msg03335.html (though it's i686). BTW, this failure also reproduces for me on current trunk.
Dead include file: dwarf.h ?
Hi, As far as I can tell, dwarf.h is not included anywhere in gcc/ or any of its subdirectories. Is there any reason not to remove this file? Thanks, Gr. Steven
Re: TARGET_SCHED_PROLOG defined twice
On 10/18/06, Marcin Dalecki <[EMAIL PROTECTED]> wrote: Looking at rs6000.opt I have found that the above command line switch variable is defined TWICE: msched-prolog Target Report Var(TARGET_SCHED_PROLOG) Init(1) Schedule the start and end of the procedure msched-epilog Target Undocumented Var(TARGET_SCHED_PROLOG) VarExists This appears of course to be wrong. The latter probably ought to be TARGET_SCHED_EPILOG, if that exists, eh? Apparently we also don't have test cases to actually verify that the proper forms of these options are accepted and have the desired effect... Gr. Steven
Question about LTO dwarf reader vs. artificial variables and formal arguments
Hello, I want to make gfortran produce better debug information, but I want to do it in a way that doesn't make it hard/impossible to read back in sufficient information for LTO to work for gfortran. I haven't really been following the whole LTO thing much, but if I understand correctly, the goal is to reconstruct information about declarations from DWARF information that we write out for those declarations. If that's the case, I wonder how LTO will handle artificial "variables" and formal argument lists. For example, gfortran adds additional formal arguments for functions that take a CHARACTER string as a formal argument, e.g. program test implicit none call sub("Hi World!") contains subroutine sub(c) character*10 c end subroutine end produces as a GIMPLE dump: MAIN__ () { static void sub (char[1:10] &, int4); _gfortran_set_std (70, 127, 0); sub ("Hi World!", 9); } sub (c, _c) { (void) 0; } where _c is strlen("Hi World!"). From a user perspective, it would be better to hide _c for the debugger because it is not something that the user had in the original program. I have a patch to hide that parameter, that is, it stops GCC from writing out DW_TAG_formal_parameter for _c. But I am worried about how this will work out later if/when someone tries to make LTO work for gfortran too. Can you still reconstruct the correct function prototype for LTO from the debug info if you don't write debug info for _c? Similarly, LTO has to somehow deal with DECL_VALUE_EXPR and the debug information that is produced from it. Gfortran (and iiuc other front ends and SRA) use this DECL_VALUE_EXPR to produce fake variables that point to some location to improve the debug experience of the user. For Fortran we use it to create fake variables to point at members of a COMMON block, for example, so that the user can do "p A" for a variable A in a common block, instead of "p name_of_the_common_block.A". Is there already some provision to handle this kind of trickery in LTO? Finally, consider another Fortran example: program debug_array_dimensions implicit none integer i(10,10) i(2,9) = 1 end Gfortran currently produces the following wrong debug information for this example: <2><94>: Abbrev Number: 3 (DW_TAG_variable) DW_AT_name: i DW_AT_decl_file : 1 DW_AT_decl_line : 1 DW_AT_type: DW_AT_location: 3 byte block: 91 e0 7c (DW_OP_fbreg: -416) <1>: Abbrev Number: 4 (DW_TAG_array_type) DW_AT_type: DW_AT_sibling : <2>: Abbrev Number: 5 (DW_TAG_subrange_type) DW_AT_type: DW_AT_lower_bound : 0 DW_AT_upper_bound : 99 <1>: Abbrev Number: 6 (DW_TAG_base_type) DW_AT_byte_size : 8 DW_AT_encoding: 5 (signed) DW_AT_name: int8 <1>: Abbrev Number: 6 (DW_TAG_base_type) DW_AT_byte_size : 4 DW_AT_encoding: 5 (signed) DW_AT_name: int4 Note the sinlge DW_TAG_subrange_type <0, 99> for the type of "i", instead of two times DW_TAG_subrange_type <1, 10> instead. This happens because in gfortran all arrays are flattened (iirc to make code generation easier). I would like to make gfortran write out the correct debug information, e.g. something with <2>: Abbrev Number: 5 (DW_TAG_subrange_type) DW_AT_type: DW_AT_upper_bound : 10 <2>: Abbrev Number: 5 (DW_TAG_subrange_type) DW_AT_type: DW_AT_upper_bound : 10 but what would happen if LTO reads this in and re-constructs the type of "i" from this information? I imagine it would lead to mis-matches of the GIMPLE code that you read in, where "i" is a 1x100 array, and the re-constructed variable "i" which would be a 10x10 2D array. Has anyone working on LTO already thought of these challanges? I'm all new to both DWARF and LTO, so forgive me if my rant doesn't make sense ;-) Gr. Steven
Re: Re: LOOP_HEADER tree code?
On 10/25/06, Devang Patel <[EMAIL PROTECTED]> wrote: > > However, various optimizer needs to know about this special tree node. > > not really (not any more than they know about other tree codes that are > not interesting for them). If we take an example of Jump Threading pass then it needs to know about this tree node and update it properly. Yes, when it modifies the CFG in ways that affect the loops info. And one nice thing about this LOOP_HEADER idea is that, in your example, Jump Threading: - can see that node so it knows there is something to update - knows what it is changing so it also knows how that affects the loops info - can change it on-the-fly This means, no need for a cleanup pass after all changes are done. So, the passes that maniuplate loop structure need to know about LOOP_HEADER and others do not need to worry about LOOP_HEADER. More acurately, the passes that manipulate the cfg. Right now most of these passes don't even know they modify the loop structure. Now, focusing on the passes that manipulate loop structure. Are these pass responsible for fixing loop info or is it responsiblity of cleanup pass ? It seems to me that a cleanup pass would defeat the purpose of keeping loop info up to date. Your cleanup pass would probably end up just recomputing everything. That said, I don't really see what a LOOP_HEADER node would give you that you can't get by making the cfg-modifying passes actually loop-aware, or perhaps by using cfghooks to update the loop information on the fly when a pass changes the CFG. It would be helpful if Zdenek could give an example where a LOOP_HEADER node is really the only way to help keep loop info accurate. Gr. Steven
Re: Re: Re: Re: LOOP_HEADER tree code?
On 10/25/06, Devang Patel <[EMAIL PROTECTED]> wrote: > > One way to achieve this is to mark n_1 (in your example) as > > "do not dead strip because I know it is used" , kind of attribute((used)). > > This is what as I understand LOOP_HEADER is used for. Big difference. New tree vs TREE_USED or DECL_PRESERVE_P bit. DECL_PRESERVE_P wouldn't work, because afaiu the number of iterations is stored in an SSA_NAME tree node , not a *DECL node. You could use TREE_USED, but your suggestion implies that dead code should be retained in the program, just for the sake of knowing how many iterations a loop has. I wouldn't be surprised if some passes are not prepared to handle that, and it sounds like just a really bad idea. Gr. Steven
Re: Re: LOOP_HEADER tree code?
On 10/25/06, Zdenek Dvorak <[EMAIL PROTECTED]> wrote: it definitely is not the only way, and seeing the reaction of people, I probably won't use it. The main reason for considering to use the tree node for me was the possibility to make the number of iterations of the loop as its operand, so that I would not need to worry about keeping it alive through dce, copy/constant propagation, etc. (without a statement carrying it in IL, I do not see a solution that would not be just asking for introducing bugs and getting broken accidentally). I wouldn't give up so fast. If there are convincing technical reasons for this kind of tree node, then your idea should be seriously considered. Many people thought ASSERT_EXPRs were a really bad idea too, when they were invented... Gr. Steven
Re: Re: Re: Re: Re: LOOP_HEADER tree code?
On 10/26/06, Devang Patel <[EMAIL PROTECTED]> wrote: On 10/25/06, Steven Bosscher <[EMAIL PROTECTED]> wrote: > You could use TREE_USED, but your suggestion implies that dead code > should be retained in the program, May be I misunderstood, but it is not dead code. Here is what Zdenek said, " ... To keep the information valid, we need > > to prevent optimizations from destroying it (e.g., if the number > > is n_1 = n_2 - 1, and this is the last use of n_1, we do not want > > DCE to remove it); ..." So you would mark n_1 with TREE_USED, and never let it be removed? What would happen if e.g. the entire loop turns out to be dead code? Or if the loop is rewritten (e.g. vectorized) in a way that changes the number of iterations of the loop? Then the assignment to n_1 would be _really_ dead, but there wouldn't be any way to tell. The nice thing about the LOOP_HEADER node is that it makes these uses of SSA names explicit. Gr. Steven
Re: Re: LOOP_HEADER tree code?
On 10/26/06, Jeffrey Law <[EMAIL PROTECTED]> wrote: > So, the passes that maniuplate loop structure need to know about > LOOP_HEADER and others do not need to worry about LOOP_HEADER. Passes which do code motions may need to know about it -- they don't need to update its contents, but they may need to be careful about how statements are moved around in the presence of a LOOP_HEADER note. It is not a note, it's a statement. The problem with RTL loop notes was that they were not statements, but rather markers, e.g. "a loop starts/ends here". The LOOP_HEADER node, on the other hand, is more like a placeholder for the result of the number of iterations computation. Basically it is a statement that does not produce a result, but does have uses. I don't see why a code motion pass would have to worry about the LOOP_HEADER node. The LOOP_HEADER node is before the loop, IIUC, so any code moved out of the loop would not affect the value of the use operand for the LOOP_HEADER (by definition, because we're in SSA form so DEFs inside the loop can't reach the LOOP_HEADER node). Gr. Steven
Re: build failure, GMP not available
On 30 Oct 2006 22:56:59 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: I'm certainly not saying that we should pull out GMP and MPFR. But I am saying that we need to do much much better about making it easy for people to build gcc. Can't we just make it so that, if gmp/ amd mpfr/ directories exist in the toplevel, they are built along with GCC? I don't mean actually including gmp and mpfr in the gcc SVN repo, but just making it possible to build them when someone unpacks gmp/mpfr tarballs in the toplevel dir. Gr. Steven
Re: build failure, GMP not available
On 10/31/06, Marcin Dalecki <[EMAIL PROTECTED]> wrote: This question is not related to the apparent instability and thus low quality of GMP/MPFR at all. This is the second time I see someone complain about GMP/MPFR instability. What is this complaint based on? We've used GMP in g95 and later gfortran since the project incarnation 7 years ago, and as far as I know we've never had to change anything for reasons of instability. In fact, AFAIK we still had source compatibility when we moved from GMP3 to GMP4. Is there some bug report / web page somewhere that describes the instability problems you folks apparently have on Macs? Gr. Steven
Re: defunct fortran built by default for cross-compiler
On 11/1/06, Joern RENNECKE <[EMAIL PROTECTED]> wrote: With literally more than ten thousand lines of error messages per multilib for fortran, that makes the test results unreportable. So you don't report any error messages at all and leave us guessing? Gr. Steven
Re: [PING] fwprop in 4.3 stage 1?
On 10/31/06, Roger Sayle <[EMAIL PROTECTED]> wrote: I foresee no problems in getting the fwprop pass merged into mainline this week. One detail I would like resolved however, is if you and Steven Bosscher could confirm you're both co-ordinating your efforts. Presumably, adding fwprop is part of the agreed upon game-plan, and not something that will complicate Steven's CSE efforts. We're not co-ordinating the effort right now, but we've obviously been working very hard together in GCC 4.2 stage1, and fwprop was "part of the plan" back then to eliminate CSE path following completely (a goal that I've since abandoned). What fwprop should achieve, is: - catch the optimizations we miss with CSE skip-blocks disabled - make the first gcse.c local const/copy prop pass redundant It used to do both these things quite well late last year, and I have no reason to believe that it would be any different right now. The only downside is that the compile time benefit is not as big as it would have been if CSE path following could have been eliminated, but fwprop is really fast anyway. Also, fwprop is a nice example pass for how to use df.c and how to use the CFG instead of working around it like CSE does ;-) So, having fwprop in the trunk will only be a good thing IMHO. Gr. Steven
Re: Handling of extern inline in c99 mode
On 11/1/06, Paolo Bonzini <[EMAIL PROTECTED]> wrote: > According to the proposal, we will restore the GNU handling for > "extern inline" even when using -std=c99, which will fix the problem > when using glibc. I am probably overlooking something, but if the only problematic system is glibc, maybe this can be fixed with a fixincludes hack? That would be a massive hack. Gr. Steven
Re: GCSE again: bypass_conditional_jumps -vs- commit_edge_insertions - problem with ccsetters?
On 11/2/06, Roger Sayle <[EMAIL PROTECTED]> wrote: Steven Bosscher might even have plans for reorganizing jump bypassing already as part of his CSE/GCSE overhaul? Yes, and one part of that plan is to pre-split all critical edges so that you never have to insert on edges. That would make your problem go away, iiuc. Gr. Steven
Re: compiling very large functions.
On 11/5/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > I lean to leave the numbers static even if they do increase as time goes > by. Otherwise you get two effects, the first optimizations get to be > run more, and you get the wierd non linear step functions where small > changes in some upstream function effect the down stream. Ok, I guess we can easily flag each function as having - many BBs - big BBs - complex CFG (many edges) and set these flags at CFG construction time during the lowering phase (which is after the early inlining pass I believe). IMHO any CFG-based criteria should be using dynamic numbers, simply because they are available at all times. Large BBs is a more interesting one, because in general they don't get smaller during optimizations. What Kenny suggests here is not new, BTW. I know that gcse already disables itself on very large functions (see gcse.c:is_too_expensive()), and probably some other passes do this as well. A grep for OPT_Wdisabled_optimization *should* show all the places where we throttle or disable passes, but it appears that warnings have not been added consistently when someone throttled a pass. AFAIK not one of the tree optimizers disables itself, but perhaps we should. The obvious candidates would be the ones that require recomputation of alias analysis, and the ones that don't update SSA info on the fly (i.e. require update_ssa, which is a horrible compile time hog). Gr. Steven
Re: compiling very large functions.
On 11/5/06, Eric Botcazou <[EMAIL PROTECTED]> wrote: > AFAIK not one of the tree optimizers disables itself, but perhaps we > should. The obvious candidates would be the ones that require > recomputation of alias analysis, and the ones that don't update SSA > info on the fly (i.e. require update_ssa, which is a horrible compile > time hog). Tree alias analysis can partially disable itself though: /* If the program has too many call-clobbered variables and/or function calls, create .GLOBAL_VAR and use it to model call-clobbering semantics at call sites. This reduces the number of virtual operands considerably, improving compile times at the expense of lost aliasing precision. */ maybe_create_global_var (ai); We have found this to be quite helpful on gigantic elaboration procedures generated for Ada packages instantiating gazillions of generics. We have actually lowered the threshold locally. Heh, I believe you! :-) IMHO we should add a OPT_Wdisabled_optimization warning there, though. Gr. Steven
Re: compiling very large functions.
On 11/5/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: I would like to point out that the central point of my proposal was to have the compilation manager be the process that manages if an optimization is skipped or not rather than having each pass make a decision on it's own. If we have a central mechanism, then it is relative easy to find some sweet spots. If every pass rolls its own, it is more difficult to balance. Hmm, I don't understand this. Why is it harder to find a sweet spot if every pass decides for itself whether to run or not? I would think that this decision should be made by each pass individually, because the pass manager is one abstraction level higher where it shouldn't have to know the behavior of each pass. Gr. Steven
Re: Polyhedron performance regression
On 11/11/06, Paul Thomas <[EMAIL PROTECTED]> wrote: Richard, > > If I had to guess I would say it was the forwprop merge... The what? :-) fwprop, see http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00141.html If someone can confirm that this patch causes the drop, I can help trying to find a fix. Gr. Steven
Re: vectorizer data dependency graph
On 11/15/06, Sebastian Pop <[EMAIL PROTECTED]> wrote: There is a ddg in this patch if somebody wants the classic Allen&Kennedy way to look at the dependences: http://gcc.gnu.org/wiki/OptimizationCourse?action=AttachFile&do=get&target=loop-distribution-patch-against-gcc-4.1.0-release.patch Any plans to merge this into the FSF trunk? Gr. Steven
Re: EXPR_HAS_LOCATION seems to always return false
On 11/17/06, Brendon Costa <[EMAIL PROTECTED]> wrote: Is there something i should be doing before using EXPR_HAS_LOCATION() ? Compile with -g, perhaps? Gr. Steven
Why does flow_loops_find modify the CFG, again?
Hi Zdenek, all, I'm running into some troubles with an if-conversion pass that runs after reload, where we have to avoid lifting insns across a loop exit edge into a loop. ifcvt.c uses flow_loops_find to find loops and mark all loop exit edges: if ((! targetm.cannot_modify_jumps_p ()) && (!flag_reorder_blocks_and_partition || !no_new_pseudos || !targetm.have_named_sections)) { struct loops loops; flow_loops_find (&loops); mark_loop_exit_edges (&loops); flow_loops_free (&loops); free_dominance_info (CDI_DOMINATORS); } I was wondering why we would sometimes *not* mark exit edges, but then I remembered that for some reason flow_loops_find modifies the CFG, which may lead to problems that we have to work around here. But if we do not mark loop exit edges, we can sometimes end up doing unprofitable if-conversions! It seems to me that a function called "flow_loops_find" is supposed to do *just* analysis, and not transformations. Apparently it now first transforms all loops into some canonical form, but that is completely inappropriate and unnecessary for some users of this loops analysis. Is this something that could be easily fixed? E.g. can we make it that flow_loops_find only performs transformations if asked to (by adding a function argument for that)? Gr. Steven
Re: [avr-gcc-list] Re: AVR byte swap optimization
On 11/19/06, Eric Weddington <[EMAIL PROTECTED]> wrote: > Use gcc head, __builtin_bswap and make sure the AVR backend > implements the > bswap rtl patterns. There's the problem. You can't just glibly say "make sure the AVR backend implements the bswap rtl patterns". There are precious few volunteers who are familiar enough with gcc internals and the avr port in particular to go do just that. AFAIK, there is no bswap rtl pattern in the avr port, at least there doesn't seem to be in 4.1.1. Why is that a problem? Do you have a different solution in mind? > Future versions of gcc may also be able to recognise these > idioms without > using the builtin, but AFAIK that's not been implemented yet. Plus there is a long lead time between when it is implemented on HEAD, then branched, released from a branch, and then when it shows up in binary distributions. That happens with all improvements that are implemented between releases, so I don't see your point. Gr. Steven
Re: [PATCH] Canonical types (1/3)
On 11/28/06, Doug Gregor <[EMAIL PROTECTED]> wrote: * tree.h (TYPE_CANONICAL): New. (TYPE_STRUCTURAL_EQUALITY): New. (struct tree_type): Added structural_equality, unused_bits, canonical fields. If I understand your patches correctly, this stuff is only needed for the C-family languages. So why steal two pointers on the generic struct tree_type? Are you planning to make all front ends use these fields, or is it just additional bloat for e.g. Ada, Fortran, Java? ;-) Gr. Steven
Re: rtl dumps
On 12/1/06, Andrija Radicevic <[EMAIL PROTECTED]> wrote: Hi, I have noticed that the INSN_CODE for all patterns in the rtl dumps .00.expand are -1 ... does this mean that the .md file was not used for the initial RTL generation? It was used, but it is assumed that the initial RTL produced by 'expand' is valid, i.e. you should be able to call recog() on all insns and not fail. Gr. Steven
Re: expand_builtin_memcpy bug exposed by TER and gfortran
On 12/5/06, Andrew MacLeod <[EMAIL PROTECTED]> wrote: My preference is to check in the TER code which exposes this bug, and open a PR against the failure with this info. That way we don't lose track of the problem, and someone can fix it at their leisure. Until then there will be a testsuite failure in gfortran for the testcase which triggers this. Does that seem reasonable? or would everyone prefer I get it fixed before checking in the TER code? No, IMHO. It's unfortunate enough if a patch introduces a bug that we only find later. It's Very Bad And Very Wrong to allow in patches that cause test suite failures. Frankly, I don't understand why you even ask. We have rules for testing for a reason. Gr. Steven
Re: void* vector
On 12/9/06, Alexey Smirnov <[EMAIL PROTECTED]> wrote: typedef void* handle_t; DEF_VEC_I(handle_t); DEF_VEC_ALLOC_I(handle_t,heap); Why DEF_VEC_I instead of DEF_VEC_P? See vec.h. Gr. Steven
Re: Bootstrap broken on mipsel-linux...
On 12/11/06, David Daney <[EMAIL PROTECTED]> wrote: From svn r119726 (Sun, 10 Dec 2006) I am getting an ICE during bootstrap on mipsel-linux. This is a new failure since Wed Dec 6 06:34:07 UTC 2006 (revision 119575) which bootstrapped and tested just fine. I don't really want to do a regression hunt as bootstraps take 3 or 4 days for me. I will update and try it again. No need. It's my CSE patch, no doubt: http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00698.html I'll try to figure out what's wrong. /home/build/gcc-build/./prev-gcc/xgcc -B/home/build/gcc-build/./prev-gcc/ -B/usr/local/mipsel-unknown-linux-gnu/bin/ -c -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition -Wmissing-format-attribute -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libdecnumber -I../libdecnumber ../../gcc/gcc/c-decl.c -o c-decl.o ../../gcc/gcc/c-decl.c: In function 'set_type_context': ../../gcc/gcc/c-decl.c:691: internal compiler error: in cse_find_path, at cse.c:5930 Please submit a full bug report, with preprocessed source if appropriate. Sic :-) A test case would be helpful. Gr. Steven
Re: Bootstrap broken on mipsel-linux...
On 12/11/06, David Daney <[EMAIL PROTECTED]> wrote: Lets assume that it doesn't effect i686 or x86_64. Because if it did, someone else would have been hit by it by now. I'm sure it doesn't, I bootstrapped&tested on those targets (and on ia64). So you would need a mips[el]-linux system in order to reproduce it. But if you had that, you could compile c-decl.c yourself to reproduce it. But if you really want it, I can get you a preprocessed version of c-decl.c. I suppose one could try it on a cross-compiler, but I have no idea if that would fail in the same manner. If you have a test case, I should be able to reproduce it with a cross. Getting a test case with a cross-compiler is the more difficult part. I could try to use a preprocessed c-decl.c from the cross-compiler configuration. But it wouldn't be the same input file as the one from your ICE, so whether that would allow me to reproduce the problem remains to be seen. If you have a preprocessed c-decl.c that ICEs for you, that would be helpful. If not, I'll just have to figure out a way to reproduce the ICE in some different way. Gr. Steven
Re: Bootstrap broken on mipsel-linux...
On 12/11/06, Kaz Kojima <[EMAIL PROTECTED]> wrote: It seems that the first tree dump which differs before and after r119711 is .099t.optimized. In that case, this is a different problem, probably caused by the new out-of-SSA pass. But to be sure, I suggest you revert my CSE patch and see if that makes the problem go away for you. Gr. Steven
Re: Bootstrap broken on mipsel-linux...
On 12/12/06, Kaz Kojima <[EMAIL PROTECTED]> wrote: "Steven Bosscher" <[EMAIL PROTECTED]> wrote: > In that case, this is a different problem, probably caused by the new > out-of-SSA pass. But to be sure, I suggest you revert my CSE patch > and see if that makes the problem go away for you. I've confirmed that that problem is remained after reverting r119706 changes of cse.c. So it may be another problem, though it might produce a wrong stage 1 compiler for mipsel-linux and end up with the ICE in stage 2. In the mipsel-linux case, we ended up with a diamond region where the jump in the IF-block was folded, so that we could extend the path along one of the diamond's arms with the JOIN-block. This could happen because cse_main traversed the basic blocks in DFS order instead of in topological order. I have just posted a hopeful fix for this. Gr. Steven
Re: 32 bit jump instruction.
On 12/13/06, Joern Rennecke <[EMAIL PROTECTED]> wrote: In http://gcc.gnu.org/ml/gcc/2006-12/msg00328.html, you wrote: However, because the SH has delayed branches, there is always a guaranteed way to find a register - one can be saved, and then be restored in the delay slot. Heh, that's an interesting feature :-) How does that work? I always thought that the semantics of delayed insns is that the insn in the delay slot is executed *before* the branch. But that is apparently not the case, or the branch register would have been over-written before the branch. How does that work on SH? Gr. Steven
Re: g++ doesn't unroll a loop it should unroll
On 12/13/06, Benoît Jacob <[EMAIL PROTECTED]> wrote: g++ -DUNROLL -O3 toto.cpp -o toto ---> toto runs in 0.3 seconds g++ -O3 toto.cpp -o toto---> toto runs in 1.9 seconds So what can I do? Is that a bug in g++? If yes, any hope to see it fixed soon? You could try adding -funroll-loops. Gr. Steven
Re: Memory allocation for local variables.
On 12/13/06, Sandeep Kumar <[EMAIL PROTECTED]> wrote: Hi all, I tried compiling the above two programs : on x86, 32 bit machines. [EMAIL PROTECTED] ~]# gcc test.c Try with optimization enabled (try -O1 and/or -O2). Gr. Steven
Re: Back End Responsibilities + RTL Generation
On 12/13/06, Frank Riese <[EMAIL PROTECTED]> wrote: One of my professors stated that a GCC Back End uses the Control Flow Graph as its input and that generation of RTL expressions occurs later on. That is not true. What roles do Back and Middle End play in generation of RTL? Would you consider the CFG or RTL expressions as the input for a GCC Back End? Let me first say that the definitions of front end, back end, and middle end are a bit hairy. You have to carefully define what you classify as belonging to the middle end or the back end. I actually try to avoid the terms nowadays. Also, you have to be specific about the version of GCC that you're talking about. GCC2, GCC3 and GCC4 are completely different internally, and even the differences between various GCC4 releases are quite significant. Anyway... The steps through the compiler are as follows: 1. front end runs, produces GENERIC 2. GENERIC is lowered to GIMPLE 3. a CFG is constructed for GIMPLE 4. GIMPLE (tree-ssa) optimizers run 5. GIMPLE is expanded to RTL, while preserving the CFG 6. RTL optimizers run 7. assembly is written out The RTL generation in step 5 is done one statement at a time. The part of the compiler that generates the RTL is a mix of shared code and of back end code: A single GIMPLE statement at a time is passed to the middle-end expand routines, which tries to produce RTL for this statement using instructions available on the target machine. The available instructions are defined by the target machine description (i.e. the back end). Try to understand cfgexpand.c and the section on named RTL patterns in the GCC internals manual. I also remembered having read the following line from the gcc internals documentation. However, I'm still not sure how to interpret this: "A control flow graph (CFG) is a data structure built on top of the intermediate code representation (the RTL or tree instruction stream) abstracting the control flow behavior of a function that is being compiled" Does that mean that a control flow graph is built after rtl has been generated or that information about that information about the control flow is incorporated into the RTL data structures? Neither. I'm assuming you're interested in how this works in recent GCC releases, i.e. GCC4 based. In GCC4, the control flow graph is built on GIMPLE, the tree-ssa optimizers need a CFG too. This CFG is kept up-to-date through the optimizers and through expansion to RTL. This means that GCC builds the CFG only once for each function. The data structures for the CFG are in basic-block.h. These data structures are most definitely *not* incorporated into the RTL structures. The CFG is independent of the intermediate representations for the function instructions. It has to be, or you could have the same CFG data structures for both GIMPLE and RTL. Hope this helps, Gr. Steven
Re: g++ doesn't unroll a loop it should unroll
On 12/14/06, Benoît Jacob <[EMAIL PROTECTED]> wrote: I don't understand why you say that. At the language specification level, templates come with no inherent speed overhead. All of the template stuff is unfolded at compile time, none of it remains visible in the binary, so it shouldn't make the binary slower. You're confusing theory and practice... Gr. Steven
Re: Do we want non-bootstrapping "make" back?
On 12/30/06, Daniel Jacobowitz <[EMAIL PROTECTED]> wrote: Once upon a time, the --disable-bootstrap configure option wasn't necessary. "make" built gcc, and "make bootstrap" bootstrapped it. Is this behavior useful? Should we have it back again? For me the current behavior works Just Fine. Gr. Steveb
Nested libcalls (was: Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs)
On Sunday 31 December 2006 00:59, Jan Hubicka wrote: > > Also I should mention, this also fixes a possible bug with libcalls that > > are embedded in one another. Before we were just assuming if we have a > > REG_RETVAL, then the previous REG_LIBCALL would be the start of the > > libcall but that would be incorrect with embedded libcalls. > > We should not have nested libcalls at all. One level of libcalls is > painful enough and we take care to not do this. It's unclear whether we can have nested libcalls or not. We expect them in some places (especially, see libcall_stack in gcse.c:local_cprop_pass) but are bound to fail miserably in others. This is something I've been wondering for a while. Maybe someone can give a definitive answer: Can libcalls be nested, or not? Gr. Steven
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/31/06, Paul Eggert <[EMAIL PROTECTED]> wrote: Also, as I understand it this change shouldn't affect gcc's SPEC benchmark scores, since they're typically done with -O3 or better. It's not all about benchmark scores. I think most users compile at -O2 and they also won't understand why they get a performance drop on their code. You say you doubt it affects performance. Based on what? Facts please, not guesses and hand-waiving... Gr. Steven
Re: gcc 3.4 > mainline performance regression
On 05 Jan 2007 07:18:47 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: At the tree level, the problem is that the assignment to a[0] is seen as aliasing a[1]. This causes the use of a[1] to look like a USE of an SMT, and the assignment to a[0] to look like a DEF of the same SMT. So in tree-ssa-loop-im.c the statements look like they are not loop invariant. I don't know we can do better with our current aliasing representation. Unless we decide to do some sort of array SRA. Or perhaps we could make the loop invariant motion pass more complicated: when it sees a use or assignment of a memory tag, it could explicitly check all the other uses/assignments in the loop and see if they conflict. I don't really know how often this would pay off, though. How about using dependence analysis instead? At the RTL level we no longer try to hoist MEM references out of loops. We now assume that is handled at the tree level. We do hoist MEMs out of loops, in gcse.c. Gr. Steven
Re: gcc 3.4 > mainline performance regression
On 1/5/07, Andrew Haley <[EMAIL PROTECTED]> wrote: This is from the gcc-help mailing list. It's mentioned there for ARM, but it's just as bad for x86-64. It appears that memory references to arrays aren't being hoisted out of loops: in this test case, gcc 3.4 doesn't touch memory at all in the loop, but 4.3pre (and 4.2, etc) does. Here's the test case: void foo(int *a) { int i; for (i = 0; i < 100; i++) a[0] += a[1]; } gcc 3.4.5 -O2: .L5: leal(%rcx,%rsi), %edx decl%eax movl%edx, %ecx jns .L5 gcc 4.3pre -O2: .L2: addl4(%rdi), %eax addl$1, %edx cmpl$100, %edx movl%eax, (%rdi) jne .L2 Thoughts? What does the code look like if you compile with -O2 -fgcse-sm? Gr. Steven
Re: gcc 3.4 > mainline performance regression
On 1/5/07, David Edelsohn <[EMAIL PROTECTED]> wrote: >>>>> Steven Bosscher writes: Steven> What does the code look like if you compile with -O2 -fgcse-sm? Yep. Mark and I recently discussed whether gcse-sm should be enabled by default at some optimization level. We're hiding performance from GCC users. The problem with it used to be that it was just very broken. When I fixed PR24257, it was still not possible to bootstrap with gcse store motion enabled. Putting someone on fixing tree load&store motion is probably more useful anyway, if you're going to do load&store motion for performance. In RTL, we can't move loads and stores that are not simple loads or stores (i.e. reg <- mem, or mem <- reg). There are two very popular targets where this is the common case ;-) Gr. Steven
We have no active maintainer for the i386 port
Hi, We currently do not have an active maintainer for the i386 port. The only listed maintainer for the port is rth, and he hasn't been around to approve patches in a while. This situation is a bit strange for a port that IMHO is one of the most important ports GCC has... In the mean time, patches don't get approved (see e.g. [1]), or they get approved by middle-end maintainers who, strictly speaking, should not be approving backend patches, as I understand it. So, can the SC please appoint a new/extra i386 port maintainer? Thanks, Gr. Steven [1] http://gcc.gnu.org/ml/gcc-patches/2007-01/msg00379.html
Re: dump after RTL expand
On 1/11/07, Andrija Radicevic <[EMAIL PROTECTED]> wrote: Hi, how could I find out from which patterns, in the md file, the 00.expand file was generated (i.e. to map the patterns in the expand file with the ones in the .md file)? Is there a compiler option/switch which would tell the compiler mark the patterns in the expand file with the insns names from the md file? There isn't. You would have to walk over the insn and make recog assign them an insn code. Gr. Steven
Re: dump after RTL expand
On 1/12/07, Andrija Radicevic <[EMAIL PROTECTED]> wrote: > On Thursday 11 January 2007 19:27, Steven Bosscher wrote: > > On 1/11/07, Andrija Radicevic <[EMAIL PROTECTED]> wrote: > > > Hi, > > > how could I find out from which patterns, in the md file, the > 00.expand > > > file was generated (i.e. to map the patterns in the expand file with > the > > > ones in the .md file)? Is there a compiler option/switch which would > tell > > > the compiler mark the patterns in the expand file with the insns names > > > from the md file? > > > > There isn't. > > > > You would have to walk over the insn and make recog assign them an insn > > code. > > That still wouldn't tell you what names were used to generate them. It's > common to have a named expander that generates other (possibly anonymous > insns). > Does that mean that the expand file isn't the dump after the initial rtl generation phase? According to internals manual, only the named define_insn and define_expand are used during rtl generation phase. The manual is correct, but the define_expands can produce the anonymous insns. If you recog an insn that isn't a named pattern, you still get the "name" of the define_insn (with the "*" in front of it) or just "" if the insn doesn't have a name. You always get at least the insn code. Gr. Steven
Ada and the TREE_COMPLEXITY field on struct tree_exp
Hello, Ada is the last user of the tree_exp->complexity field. Removing this field should reduce GCC's memory usage by about 5% on a 64 bit host. Could an Ada maintainer see if it possible to remove the use of this field? I would think it shouldn't be too hard -- TREE_COMPLEXITY is used only inside ada/decl.c. But I haven't been able to figure out myself yet how to avoid using TREE_COMPLEXITY there... Thanks, Gr. Steven
Re: CSE not combining equivalent expressions.
On Thursday 18 January 2007 09:31, Jeffrey Law wrote: > I haven't followed this thread that closely, but it seems to me this > could be done in the propagation engine. > > Basically we keep track of the known zero, sign bit copies and known > nonzero bits for SSA names, then propagate them in the obvious ways. > Basically replicating a lot of what combine & cse do in this area, > but at the tree level. It's something I've always wanted to see > implemented, but never bothered to do... I had this implemented at one point (2 years ago??) and I could not show any real benefit. There were almost no opportunities for this kind of optimization in GCC itself or in some benchmarks I looked at. There appear to be more bit operations in RTL, so perhaps it is a better idea to implement a known-bits propagation pass for RTL, with the new dataflow engine. Gr. Steven
Re: Ada and the TREE_COMPLEXITY field on struct tree_exp
On 1/18/07, Richard Kenner <[EMAIL PROTECTED]> wrote: > Ada is the last user of the tree_exp->complexity field. Removing > this field should reduce GCC's memory usage by about 5% on a 64 bit > host. Could an Ada maintainer see if it possible to remove the use > of this field? I would think it shouldn't be too hard -- > TREE_COMPLEXITY is used only inside ada/decl.c. But I haven't been > able to figure out myself yet how to avoid using TREE_COMPLEXITY there... It's just being used as a cache to avoid recomputing a value. My suggestion would be to replace it with a hash table. It'll tend to keep nodes around a little more than usual, but that should be a tiny cost. I had thought of a hash table, too, but I couldn't figure out where to initialize and free it (i.e. where it is a "live" table, so to speak). For example, I don't know if this table would be required after gimplification, and I also don't even know how GNAT translates its own representation to GIMPLE (whole translation unit at once? function at a time?). Gr. Steven
Re: raising minimum version of Flex
On 21 Jan 2007 22:13:06 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: Ben Elliston <[EMAIL PROTECTED]> writes: > I think it's worth raising the minimum required version from 2.5.4 to > 2.5.31. I want to point out that Fedora Core 5 appears to still ship flex 2.5.4. At least, that is what flex --version reports. (I didn't bother to check this before.) I think we need a very strong reason to upgrade our requirements ahead of common distributions. We've already run into that problem with MPFR. For MPFR, everyone needs to have the latest installed to be able to build gcc. That is not the case with flex. No-one needs flex at all to build gcc, except gcc hackers who modify one of the (two or three?) remaining flex files and regenerate the lexers. So you can't really compare flex and MPFR this way. If flex 2.5.31 is already four years old, it doesn't seem unreasonable to me to expect people to upgrade if their distribution ships with an even older flex. Gr. Steven
Re: About building conditional expressions
On 1/23/07, Ferad Zyulkyarov <[EMAIL PROTECTED]> wrote: But, as I noticed this function "build" is not maintained (used) by gcc any more. Instead build, what else may I use to create a conditional expression node? Look for buildN where N is a small integer ;-) I think you want build2 for EQ_EXPR. Gr. Steven
Re: [RFC] Our release cycles are getting longer
On 1/23/07, Diego Novillo <[EMAIL PROTECTED]> wrote: So, I was doing some archeology on past releases and we seem to be getting into longer release cycles. With 4.2 we have already crossed the 1 year barrier. Heh. Maybe part of the problem here is that the release manager isn't very actively persuing a release. The latest GCC 4.2 status report is from October 17, 2006, according to the web site. That is already more than 100 days ago. For 4.3 we have already added quite a bit of infrastructure that is all good in paper but still needs some amount of TLC. And the entire backend dataflow engine is about to be replaced, too. GCC 4.3 is probably going to be the most experimental release since GCC 4.0... There was some discussion on IRC that I would like to move to the mailing list so that we get a wider discussion. There's been thoughts about skipping 4.2 completely, or going to an extended Stage 3, etc. Has there ever been a discussion about releasing "on demand"? Almost all recent Linux and BSD distributions appear to converge on GCC 4.1 as the system compiler, so maybe there just isn't a "market" for GCC 4.2. I don't see any point in an extended Stage 3. People work on what they care about, and we see time and again that developers just work on branches instead of on bug fixes for the trunk when it is in Stage 3. IMHO the real issue with the GCC release plan, is that there is no way for the RM to make people fix bugs. I know the volunteer blah-blah, but at the end of the day many bugs are caused by the people who work on new projects on a branch when the trunk is in Stage 3. Maybe there should just be some rules about accepting projects for the next release cycle. Like, folks with many bugs assigned to them, or in their area of expertise, are not allowed to merge a branch or big patches into the trunk during Stage 1. Not that I *really* believe that would work... But skipping releases is IMHO not really a better idea. Gr. Steven
Re: Signed int overflow behaviour in the security context
On 1/25/07, Andreas Bogk <[EMAIL PROTECTED]> wrote: "It's not my fault if people write buggy software" is a lame excuse for sloppy engineering on the part of gcc. So basically you're saying gcc developers should compensate for other people's sloppy engineering? ;-) Gr. Steven
Re: [RFC] Our release cycles are getting longer
On 1/25/07, H. J. Lu <[EMAIL PROTECTED]> wrote: > >Gcc 4.2 has a serious FP performace issue: > > > >http://gcc.gnu.org/ml/gcc/2007-01/msg00408.html > > > >on both ia32 and x86-64. If there will be a 4.2.0 release, I hope it > >will be addressed. > > As always, the best way to ensure that it is addressed if it is > important to you is to address it yourself, or pay someone to do so :-) The fix is in mainline. The question is if it should be backported to 4.2. ISTR Dan already made it clear more than once that the answer to that question is a loud NO. Gr. Steven
G++ OpenMP implementation uses TREE_COMPLEXITY?!?!
Hello rth, Can you explain what went through your mind when you picked the tree_exp.complexity field for something implemented new... :-( You know (or so I assume) this was a very Very VERY BAD thing to do, if we are ever going to get rid of TREE_COMPLEXITY, which is a major memory hog. We are all better off if we can remove TREE_COMPLEXITY. I thought we were there with an Ada patch I've just crafted, and with the effort of Tom Tromey to remove TREE_COMPLEXITY usage from the java front end. But my all-languages bootstrap failed, and guess what: /* Used to store the operation code when OMP_ATOMIC_DEPENDENT_P is set. */ #define OMP_ATOMIC_CODE(NODE) \ (OMP_ATOMIC_CHECK (NODE)->exp.complexity) We should _not_ be doing this. *Especially* not through anything else than the TREE_COMPLEXITY accessor macro to hide the issue... or maybe that was done on purpose? ;-) I don't know if there is another place where we can store this value, but we definitely should. It is hugely disappointing to see that, just when we're there with all other front ends, you've just introduced another user of the tree_exp.complexity field. Can you please help me fix this ASAP? Gr. Steven
Re: Ada and the TREE_COMPLEXITY field on struct tree_exp
On 1/18/07, Richard Kenner <[EMAIL PROTECTED]> wrote: > I had thought of a hash table, too, but I couldn't figure out where to > initialize and free it (i.e. where it is a "live" table, so to speak). For > example, I don't know if this table would be required after gimplification, > and I also don't even know how GNAT translates its own representation to > GIMPLE (whole translation unit at once? function at a time?). It's fairly conventional in that part. But that's not relevant here. This is used for transmitting location information on FIELD_DECLs back to the front end. Most records in Ada are defined at GCC's global level, so there's little point in doing anything other than a hash table that's initialized early on (e.g., in the routine "gigi") and never freed. Also, the current code just saves the result for EXPR_P nodes since only those have TREE_COMPLEXITY, but if you're switching to a hash table, it's probably best just to record *all* results in it. OK, attached is the preliminary hack I created some time ago. After some changes, it now bootstraps, but I haven't tested it yet. I'm passing it as an RFC. I did not go as far as what you suggested, because I don't want to change code I don't really understand. This is the minimum patch I would need to remove the complexity field from struct tree_exp. If one of you can do better than this, for the purpose of GNAT, please go ahead and change it any way you see fit ;-) No point in getting too sophisticated here: this is just a small hack to avoid pathalogical compile-time behavior when compiling certain very complex record types. Are these test cases in the FSF test suite? Thanks, Gr. Steven 2007-xx-xx Steven Bosscher <[EMAIL PROTECTED]> gcc/ * tree.c (iterative_hash_expr): Handle types generically. Also handle PLACEHOLDER_EXPR nodes. ada/ * decl.c: Include hashtab.h and gt-ada-decl.h (struct cached_annotate_value_t, cached_annotate_value_tab, cached_annotate_value_hash, cached_annotate_value_eq, cached_annotate_value_marked_p, cached_annotate_value_lookup, cached_annotate_value_insert): New data structures and support functions to implement a cache for annotate_value results. (annotate_value): Use the hash table as a cache, instead of using TREE_COMPLEXITY. Index: tree.c === --- tree.c (revision 121230) +++ tree.c (working copy) @@ -5158,12 +5158,21 @@ iterative_hash_expr (tree t, hashval_t v /* DECL's have a unique ID */ val = iterative_hash_host_wide_int (DECL_UID (t), val); } + else if (class == tcc_type) + { + /* TYPEs also have a unique ID. */ + val = iterative_hash_host_wide_int (TYPE_UID (t), val); + } else { - gcc_assert (IS_EXPR_CODE_CLASS (class)); - val = iterative_hash_object (code, val); + /* The tree must be a placeholder now, or an expression. + For anything else, die. */ + if (code == PLACEHOLDER_EXPR) + return val; + gcc_assert (IS_EXPR_CODE_CLASS (class)); + /* Don't hash the type, that can lead to having nodes which compare equal according to operand_equal_p, but which have different hash codes. */ Index: ada/decl.c === --- ada/decl.c (revision 121230) +++ ada/decl.c (working copy) @@ -34,6 +34,7 @@ #include "convert.h" #include "ggc.h" #include "obstack.h" +#include "hashtab.h" #include "target.h" #include "expr.h" @@ -5864,6 +5865,104 @@ compare_field_bitpos (const PTR rt1, con return 1; } + +/* In annotate_value, we compute an Uint to be placed into an Esize, + Component_Bit_Offset, or Component_Size value in the GNAT tree. + Because re-computing the value is expensive, we cache the unique + result for each tree in a hash table. The hash table key is the + hashed GNU tree, and the hash table value is the useful data in + the buckets. + + The hash table entries are pointers to cached_annotate_value_t. + We hash GNU_SIZE in the insert and lookup functions for this + the hash table, using iterative_hash_expr. Caching the hash + value on the bucket entries speeds up the hash table quite a + bit during resizing. */ + +struct cached_annotate_value_t GTY(()) +{ + /* Cached hash value for this table entry. */ + hashval_t hashval; + + /* The cached value. + ??? This should be an Uint but gengtype choques on that. */ + int value; + + /* The tree that the value was computed for. */ + tree gnu_size; +}; + +/* The hash table used as the annotate_value cache. */ +static GTY ((if_marked ("cached_annotate_value_marked_p"), + param_is (struct cached_annotate_value_t))) + htab_t cached_annotate_value_tab; + +/* Hash an annotate_value result for the annotate_valu
Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!
On 1/28/07, Mark Mitchell <[EMAIL PROTECTED]> wrote: It's entirely reasonable to look for a way to get rid of this use of TREE_COMPLEXITY, but things like: > You know (or so I assume) this was a very Very VERY BAD thing to do are not helpful. Of course, if RTH had thought it was a bad thing, he wouldn't have done it. Fine. Then consider all my efforts to remove it finished. Gr. Steven
Re: Ada and the TREE_COMPLEXITY field on struct tree_exp
On 1/28/07, Steven Bosscher <[EMAIL PROTECTED]> wrote: OK, attached is the preliminary hack I created some time ago. After some changes, it now bootstraps, but I haven't tested it yet. I'm passing it as an RFC. This patch is hereby withdrawn. Gr. Steven
Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!
On 1/29/07, Paolo Bonzini <[EMAIL PROTECTED]> wrote: I hope Steven accepts a little deal: he exits angry-stevenb-mode, and I donate him this untested patch to remove TREE_COMPLEXITY from C++. No, thank you. I've decided long ago that I'm not going to work on anything unless there is nobody working in the other direction. In the case of TREE_COMPLEXITY, one of the best and most prominent gcc hacker decided to use something of which, I believe, everyone thinks it should go. And he did so in a way almost as if to cover it up, by accessing the field directly instead of through the accessor macros. So I freak out, which is not good, I know. I appologize to those who feel offended, because I did not mean to. My "what was on your mind" remark was *very* tongue-in-cheek, because clearly rth wouldn't have done this when he would have had more time/patience/whatever. See his own remark in the commit mail about his state of mind when he commited this bit. But then to have Mark *support* rth's change, that really shows the total lack of leadership and a common plan in the design of gcc. Why should I spend hours on this kind of cleanup, only to feel frustrated, to make others dislike me, and to have zero result in the end? I'll just work on something else instead. Gr. Steven
Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!
On 1/29/07, Joe Buck <[EMAIL PROTECTED]> wrote: On Mon, Jan 29, 2007 at 03:24:56PM +0100, Steven Bosscher wrote: > But then to have Mark *support* rth's change, that really shows the > total lack of leadership and a common plan in the design of gcc. There you go again. Actually, there *you* go again :-) Do you know I can't find even just one mail from you that did not in one way or another criticize the way I said something? Mark did not support or oppose rth's change, he just said that rth probably thought he had a good reason. Well, forgive me for missing the subtle difference between supporting a change and suggesting there was a good reason for the change. Also, to say that there is no common plan in GCC, or that there is no good leadership of the project, is just the expresion of my opinion. I really am of the opinion that gcc is a strange project which claims to be open but where the maintainers are appointed by a group of people that hasn't changed in ten years time. If you see that as attacking someone personally, that is your problem, not mine. If you think that there's a problem with a patch, there are ways to say so without questioning the competence or good intentions of the person who made it. Where did I question rth's competence? If you're going to be so picky about everything I say, can you at least be specific? Gr. Steven
Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!
On 1/29/07, Mark Mitchell <[EMAIL PROTECTED]> wrote: Email is a tricky thing. I've learned -- the hard way -- that it's best to put a smiley on jokes, because otherwise people can't always tell that they're jokes. I did use a smiley. Maybe I should put the smiley smiling then, instead of a sad looking smlley. Gr. Steven
Re: Use of INSN_CODE
On 2/1/07, Pranav Bhandarkar <[EMAIL PROTECTED]> wrote: However, the internals only warn against using INSN_CODE on use, clobber, asm_input, addr_vec, addr_diff_vec. There is no mention of other members of the other members of RTX_EXTRA. or shouldnt recog_memoized have an INSN_P check in it ? Am I missing something here ? recog* should ice if what it gets passed is not an insn (i.e. !INSN_P). Gr. Steven
Re: "error: unable to generate reloads for...", any hints?
On 2/8/07, 吴曦 <[EMAIL PROTECTED]> wrote: Thanks. But what does it mean by saying: "Sometimes an insn can match more than one instruction pattern. Then the pattern that appears first in the machine description is the one used." Basically it means, "Don't do that" ;-) Make your insns match only one pattern. Gr. Steven
Re: Division by zero
On 2/10/07, Jie Zhang <[EMAIL PROTECTED]> wrote: The code I posted in my first email is from libgloss/libnosys/_exit.c. It's used to cause an exception deliberately. From your replies, it seems it should find another way to do that. Maybe you can use __builtin_trap() ? Gr. Steven
Re: Some thoughts and quetsions about the data flow infrastracture
On 2/12/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: I like df infrastructure code from the day one for its clearness. Unfortunately users don't see it and probably don't care about it. With my point of view the df infrastructure has a design flaw. It extracts a lot of information about RTL and keep it on the side. It does not make the code fast. It also does not make code slow. And the data it extracts and keeps on the side, could be used to simplify many algorithms in gcc (most notably, cprop, mode switching, and regmove). There is a tremendous potential for speedups in RTL passes if they start using the df register caches, instead of traversing the PATTERN of every insn. Gr. Steven
Re: Some thoughts and quetsions about the data flow infrastructure
On 2/13/07, Vladimir N. Makarov <[EMAIL PROTECTED]> wrote: > There are certainly performance issues here. There are limits on > how much I, and the others who have worked on this have been able to > change before we do our merge. So far, only those passes that were > directly hacked into flow, such as dce, and auto-inc-dec detection > have been rewritten from the ground up to fully utilize the new > framework. However, it had gotten to the point where the two > frameworks really should not coexist. Both implementations expect > to work in an environment where the information is maintained from > pass to pass and doing with two systems was not workable. So the > plan accepted by the steering committee accommodates the wholesale > replacement of the dataflow analysis but even after the merge, there > will still be many passes that will be changed. Does it means that compiler will be even more slower? No, it will mean the compiler will be faster. Sooner if you help. You seem to believe that the DF infrastructure is fundamentally slower than flow is. I believe that there are other reasons for the current differences in compile time. AFAICT the current compile time slowdowns on the dataflow branch are due to: * bitmaps bitmaps bitmaps. We badly need a faster bitmap implementation. * duplicate work on insns scanning: 1. DF scans all insns and makes available accurate information 2. Many (most) passes see it and think, "Hey, I can do that myself!", and they rescan all insns for no good reason. The new passes, that use the new infrastructure, are among the fastest in the RTL path right now. The slow passes are the passes doing their own thing (CSE, GCSE, regmove, etc.). * duplicate work between passes (minor): - on the trunk, regmove can make auto increment insns - on the df branch, the auto-inc-dec pass makes those transformations redundant * earlier availability of liveness information: - On the trunk we compute liveness for the first time just before combine - On the dataflow branch, we have liveness already after the first CSE pass Updating it between CSE and combine over ~20 passes is probably costly compared to doing nothing on the trunk. (I believe having cfglayout mode early in the compiler will help reduce this cost thanks to no iterations in cleanup_cfg) Maybe I overestimate the cost of some of these items, and maybe I'm missing a few items. But the message is the same: There is still considerable potential for speeding up GCC using the new dataflow infrastructure. Gr. Steven
Re: Some thoughts and quetsions about the data flow infrastracture
On 2/13/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: Wow, I got so many emails. I'll try to answer them in one email in Let us look at major RTL optimizations: combiner, scheduler, RA. ...PRE, CPROP,SEE, RTL loop optimizers, if-conversion, ... It is easy to make your arguments look valid if you take it as a proposition that only register allocation and scheduling ought to be done on RTL. The reality is that GIMPLE is too high level (by design) to catch many useful transformations performed on RTL. Tthink CSE of lowered addresses, expanded builtins, code sequences generated for bitfield operations and expensive instructions (e.g. mul, div). So we are going to have more RTL optimizers than just regalloc and sched. Many RTL optimizations still matter very much (disable some of them and test SPEC again, if you're unconvinced). Having a uniform dataflow framework for those optimizations is IMHO a good thing. Do we need a global analysis for building def-use use-def chains? We don't need it for combiner (only in bb scope) It seems to me that this limitation is only there because when combine was written, the idea of "global dataflow information" was in the "future work" section for most practical compilers. So, perhaps combine, as it is now, does not need DU/UD chains. But maybe we can improve passes like this if we re-implement them in, or migrate them to a better dataflow framework. Gr. Steven
Re: Some thoughts and quetsions about the data flow infrastracture
On 2/13/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: I am just trying to convince that the proposed df infrastructure is not ready and might create serious problems for this release and future development because it is slow. Danny is saying that the beauty of the infrastracture is just in improving it in one place. I am agree in this partially. I am only affraid that solution for faster infrastructure (e.g. another slimmer data representation) might change the interface considerably. I am not sure that I can convinince in this. But I am more worried about 4.3 release and I really believe that inclusion of the data flow infrastructure should be the 1st step of stage 1 to give people more time to solve at least some problems. I recall this wonderful quote of just a few days ago, which perfectly expresses my feelings about the proposed merge of the dataflow branch for GCC 4.3: "I would hope that the community would accept the major structural improvement, even if it is not a 100% complete transition, and that we can then work on any remaining conversions in the fullness of time." -- Mark Mitchell, 11 Feb 2007 [1] :-D Gr. Steven [1] http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01012.html
Re: Some thoughts and quetsions about the data flow infrastracture
On 2/13/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: > Why is it unacceptable for it to mature further on mainline like >Tree-SSA? > > Two releases one after another to avoid. No one real experiment to try to rewrite an RTL optimization to figure out how def-use chain will work. Vlad, this FUD-spreading is beginning to annoy me. Please get your view of the facts in order. There *are* passes rewritten in the new framework to figure out how this will work. In fact, some of those passes existed even before the rest of the backend was converted to the new dataflow scheme. Existing on trunk even now: fwprop, see, web, loop-iv. New on the branch: at least auto-inc-dec. Gr. Steven
Re: Some thoughts and quetsions about the data flow infrastracture
On 2/12/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: Getting 0.5% and 11.5% slowness (308sec vs 275sec for compiling SPECINT2000) does not seems reasonable Just to be sure: Did you build with --disable-checking for both compilers? I often find myself comparing compilers with checking enabled, so, you know, just checking... ;-) Thanks, Gr. Steven
Call for help: when can compare_and_jump_seq produce sequences with control flow insns?
Hello, As some folks perhaps have noticed, my effort to make gcc use cfglayout mode between expand and, roughly, sched1, have stagnated a bit. I am completely stuck on a problem that I basically can't trigger. In other words, I *know* I should expect problems if I make a certain change, but I haven't been able to actually trigger that problem. Let me explain that... Consider the following code, from loop-unroll.c: basic_block split_edge_and_insert (edge e, rtx insns) { basic_block bb; if (!insns) return NULL; bb = split_edge (e); emit_insn_after (insns, BB_END (bb)); bb->flags |= BB_SUPERBLOCK; return bb; } We call this function to insert insns sequences produced by either compare_and_jump_seq or expand_simple_binop. compare_and_jump_seq can produce insns sequences with control flow insns in it (i.e. jumps) (I am not sure about expand_simple_binop, but I think it never needs control flow insns). We have to split a block with multiple control flow insns into multiple blocks at some point. We could split it in place, but loop-unroll decides to defer it until going out of cfglayout mode, where we now have to call break_superblocks to split the basic blocks with BB_SUPERBLOCK set on them. Here comes the problem: break_superblocks() doesn't work in cfglayout mode. There is no serialized insns stream, so you can't know what the fallthrough edge should be. I could fix this, but it is both unclean and hard. The alternative is to go out of cfglayout mode, fixup the CFG, and go back into cfglayout mode. Wasteful, IMHO, so I'd like to avoid that solution, too. I don't want to go out of cfglayout mode (I want to stay in it, that's the whole point ;-) and since break_superblocks() doesn't work for me, I have re-introduced find_sub_basic_blocks (which was removed by Kazu long ago), made it work in cfglayout mode, and use it in split_edge_and_insert(). That way, I update the CFG in place, and simply avoid break_superblocks. This was not hard to do. I think. But now that I've implemented it, I need to test the new code somehow. And I can't find a test case. I've tried to craft some test case based on how I understand loop-unroll should work, but I did not succeed. So I moved on to brute force methods. I have tested a small patch on i686, x86_64, ia64, mips, and sh: -- Index: loop-unroll.c === --- loop-unroll.c (revision 122011) +++ loop-unroll.c (working copy) @@ -879,7 +879,6 @@ split_edge_and_insert (edge e, rtx insns return NULL; bb = split_edge (e); emit_insn_after (insns, BB_END (bb)); - bb->flags |= BB_SUPERBLOCK; return bb; } -- My thoughts here were, that if I can make the compiler crash with this patch, my new find_sub_basic_blocks should fix that crash. So I'm trying to make gcc crash with the above patch. I know for sure that the compiler would crash if it finds a basic block with more than one control flow insn, but without the BB_SUPERBLOCK flag. verify_flow_info has good checks for this. But the patch doesn't trigger failures on the targets I tested. I can't get gcc to ICE with this patch, hence I can't find a test case for my patch. So I'm looking for help here: Who can help me find a test case to trigger a verify_flow_info ICE in GCC with the above patch applied? Can people try this patch on their favorite target, and see if they can trigger a test suite failure? Hope you can help, Thanks, Gr. Steven
Re: GCC 4.2.0 Status Report (2007-02-19)
On 2/20/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: >[Option 1] Instead of 4.2, we should backport some functionality from >4.2 to the 4.1 branch, and call that 4.2. > >[Option 2] Instead of 4.2, we should skip 4.2, stabilize 4.3, and call >that 4.2. > >[Option 3] Like (2), but create (the new) 4.2 branch before merging the >dataflow code. > > > > ... >Considering the options above: > >* I think [Option 3] is unfair to Kenny, Songbae, and others who have >worked on dataflow code. The SC set criteria for that merge and a >timeline to do the merge, and I believe that the dataflow code has met, >or has nearly met, those criteria. > > In term of ports, yes I am agree. As the preformance even with last Paolo's patches (some changes could be applied to the mainline too, so it is not only about df), the branch compiler is still 8.7% slower for SPECint2000 compilation on 2.66Ghz Core2 with --enable-check=release. I mostly agree with Vlad. IMHO the dataflow branch is in a state where merging it early in a stage1 of a release cycle makes sense, but for gcc 4.3 it is getting a bit late. A lot depends on the current state of the trunk, of course. Do we also have some quality indicators (bug numbers, compile time performance, SPEC numbers, etc.) to compare it with the current gcc 4.2 and gcc 4.1 branches? I don't think it would be very useful to stabilize the trunk if that can't be done in a matter of, say, two months. If it takes longer than that, releasing gcc 4.2 as-is would be my choice. Yes, there is a SPEC performance gap, but SPEC is not the one-benchmark-to-rule-them-all, and there are things in the current gcc 4.2 release branch (such as OpenMP, and a hugely improved GFortran) that I would like to see released. Not releasing GCC 4.2 is IMHO not a really good option. If we do that, GCC 4.3 will contain so much new code that the number of not yet uncovered bugs that our users may run into, may be larger than we can handle. Gr. Steven
Re: Question about source-to-source compilation
On 2/21/07, Thomas Bernard <[EMAIL PROTECTED]> wrote: Hello all, As far as I know, GCC 4.x is easily retargetable for a new architecture. I would be interested by source-to-source compilation with the GCC framework. For instance, let's say the input language is C and the output language is C annotated with pragmas which are the results of some code analysis (done at middle-end level). I do not think that the GCC back-end could support a programming language such as C, C++ or Java. Is GCC 4.x designed to source-to-source compilation ? Is that possible or do I miss something here ? It is not always possible. GCC is certainly not designed for it. You will have problems mostly with types and decls, which are hard to reproduce from the intermediate representation once it has been lowered to GIMPLE. Gr. Steven
Re: Inconsistent next_bb info when EXIT is a successor
On 3/2/07, Andrey Belevantsev <[EMAIL PROTECTED]> wrote: I have tried to reorganize the check so that the "e->src->next_bb == e->dest" condition is checked for all edges (see the patch below). Of course, GCC does not bootstrap with this patch, triggering an assert of incorrect fallthru block in cfg_layout_finalize, after RTL loop optimizations. In my case, combine has broken that condition. No. The condition you're checking is simply not true in cfglayout mode. The whole point of cfglayout mode is to get rid of the requirement that basic blocks are serial. That means a fallthru edge in cfglayout mode doesn't have to go to next_bb. It can go to *any* bb. Gr. Steven
Re: Inconsistent next_bb info when EXIT is a successor
On 3/2/07, Andrey Belevantsev <[EMAIL PROTECTED]> wrote: Steven Bosscher wrote: > No. The condition you're checking is simply not true in cfglayout > mode. The whole point of cfglayout mode is to get rid of the > requirement that basic blocks are serial. That means a fallthru edge > in cfglayout mode doesn't have to go to next_bb. It can go to *any* > bb. Yes, but I'm not in cfglayout mode, because I'm either in sched1 or sched2. In that case, should this condition be preserved or not? The condition should always be preserved when you are not in cfglayout mode, but... You wrote: > > During my work on the selective scheduler I have triggered an assert in > > our code saying that a fall-through edge should have e->src->next_bb == > > e->dest. This was for a bb with EXIT_BLOCK as its fall-through > > successor, but its next_bb pointing to another block. I don't understand this. You're saying there is a fallthrough edge from your e->src to EXIT_BLOCK. This case is explicitly allowed by the checking code. It is an exception from the rule: For a fallthrough edge to EXIT, e->src->next_bb != e->dest is OK. It is hard to tell without more context what your problem is. That assert, is it an assert in your own code? Maybe it is too strict? Gr. Steven Gr. Steven
Re: CFG question
On 3/4/07, Sunzir Deepur <[EMAIL PROTECTED]> wrote: hello ppl, when I use -fdump-rtl-all with -dv I get CFG files. where can I learn the syntax of that CFG files ? it seems some kind of LISP language... As the fine manual says: `-dv' For each of the other indicated dump files (either with `-d' or `-fdump-rtl-PASS'), dump a representation of the control flow graph suitable for viewing with VCG to `FILE.PASS.vcg'. So my guess is that the syntax is VCG's. Gr. Steven
Re: CFG question
On 3/4/07, Sunzir Deepur <[EMAIL PROTECTED]> wrote: Forgive me, I had mistake in the question - I meant the debug dump files that we get just by using the -fdump-rtl-all. not the vcg files. how can I understand their syntax ? http://gcc.gnu.org/onlinedocs/gccint/RTL.html#RTL Gr. Steven
Re: Improvements of the haifa scheduler
On 3/4/07, Andrew Pinski <[EMAIL PROTECTED]> wrote: On 3/4/07, Vladimir N. Makarov <[EMAIL PROTECTED]> wrote: > Another important thing to do is to make the 1st scheduler register > pressure sensitive. I don't know how many times this has to be said, no this is not the correct approach to fix that issue. The correct fix is able for the register allocator to work correctly and fix up the IR. Andrew, your truth isn't necessarily _the_ truth in this matter ;-) Gr. Steven
Re: BUG: wrong function call
On 3/6/07, W. Ivanov <[EMAIL PROTECTED]> wrote: Paulo J. Matos wrote: > On 3/6/07, W. Ivanov <[EMAIL PROTECTED]> wrote: >> Hi, I use multiple inheritance in my project. In the child class i have >> functions GetParam() and SetParam(). >> In the cpp-file I call GetParam() function, but I fell to SetParam() >> function. >> Can You help me? >> > > Don't take me wrong but it is most likely a bug in your code. Still, > you might want to inform the developers (not me) through this mailing > list which code you're compiling and which version of gcc you're > using. > > Cheers, > Please, give me a mail address of developers. You're already reaching pretty much all of them through this mailing list. Gr. Steven
Looking for specific pages from Muchnick's book
Hi, I found this old patch (http://gcc.gnu.org/ml/gcc-patches/2003-06/msg01669.html) that refers to pages 202-214 of Muchnick's "Advanced Compiler Design and Implementation" book. That book still is not in my own compiler books collection because of its price. I used to have access to a copy in a university library, but that copy has been removed from the collection and, apparently, it's been disposed off :-( Could someone scan those pages and send them to me, please? Thanks, Gr. Steven
Re: Looking for specific pages from Muchnick's book
On 3/8/07, Steven Bosscher <[EMAIL PROTECTED]> wrote: Could someone scan those pages and send them to me, please? I received some private mails from people that are concerned about copyright issues and all that. I should have said I've actually ordered the book from Amazon (the price used to be a problem, back when I was a student), but the shipping to Europe takes at least 9 days, and in my experience usually more than a month. In order to move ahead with a plan I'm persuing, I just want to read those pages asap, not many weeks from now ;-) Gr. Steven
Re: Looking for specific pages from Muchnick's book
On 3/8/07, Robert Dewar <[EMAIL PROTECTED]> wrote: Dave Korn wrote: > A few pages for personal study? That's fair use by any meaningful > definition, no matter how much the RIAA/MPAA/similar-copyright-nazis would > like to redefine the meanings of perfectly clear words and phrases in the > english language. It is of course way off topic, but just so no one is confused, "fair use" does not mean "use that anyone would consider fair", it refers specifically to the fair use section of the copyright act, which lays out very specific criteria. So the question of "perfectly clear words and phrases" is not the issue. I suggest anyone interested actually read the statute! In the mean time, I've received those pages. I'll make sure to ritually burn them when I finally receive the book. To stray a bit further off topic, I'd actually much more enjoy the book if I would not have to print out almost as many pages as the book has to get all the errata. IMVHO very few books have a poorer quality/price ratio than Muchnick, which is why I have never found it worth it to buy it before. There should be a law that says you can freely copy books with too many errata ;-) Gr. Steven
Re: Looking for specific pages from Muchnick's book
On 3/9/07, Vladimir N. Makarov <[EMAIL PROTECTED]> wrote: o Muchnik book is a fat one. Muchnick's book is rather encyclopedia of optimizations and can be considered as collection of articles with many details (sometimes too many). But some themes (like RA and scheduling) are described not deep. Muchnick is also famous for its >150 A4 pages of errata, especially the 1st and 2nd print. I really wouldn't recommend it to you unless you're looking for a compiler algorithms cook book. o Robert Morgan. Building an Optimizing compiler. This is my favorite book. If you've read the Dragon book and this one, you're well under way to being a compiler expert. I agree with Vlad about the contents of the book, but it is the only fairly comprehensive introduction text I know of that deals with LCM and SSA at a level that even I can understand ;-) o Appel. Modern Compiler implementation in C/Java/ML. Another good book to start to study compilers from parser to code generation and basic optimizations. I especially like the version in ML (Modern compiler implementation in ML). The version in ML is the best of the three. The other two look too much like "had to do this"-books where algorithms are translated from ML, which makes them look very unnatural in C/Java. o Aho/Lam/Sethi/Ulman. Compilers: Principles, Techniques, and Tools. 2nd edition. Personally I don't like it because it is based on outdated (although classical) book. I attached a review of this book which I wrote more than year ago (when the book was not ready). This one is old, but it is a classic. The 1st edition should be on every compiler engineer's book shelf, just because. I have never seen the 2nd edition myself. Grune et. al. "Modern Compiler Design" is another good introduction text, especially if you're interested in various parsing techniques. Gr. Steven
Re: Problem with reg_equiv_alt_mem
On 3/12/07, Unruh, Erwin <[EMAIL PROTECTED]> wrote: In a private port I had the problem that reg_equiv_alt_mem_list did contain the same RTL as reg_equiv_memory_loc. This caused an assert in delete_output_reload, where these are compared with equal_rtx_p. The list is build with push_reg_equiv_alt_mem, but only when "tem != orig". The value tem is build with find_reloads_address. Within that function we have some code which simply unshares the RTL. The return value will be the new RTL, even in some cases where the RTL did not change. I think the test in front of push_reg_equiv_alt_mem should be done via rtx_equal_p. This would match the assert in delete_output_reload. My private port is based on GCC 4.1.0, but the code looks the same in 4.3. I do not have papers on file so someone else should prepare a patch. For sufficiently small patches (usually less than 10 lines changed is used as the norm) you don't need to have a copyright assignment on file. Such small changes are apparently not covered by copyright. So if you could send a patch, that'd be quite helpful ;-) Gr. Steven
Re: S/390 Bootstrap failure: ICE in cse_find_path, at cse.c:5930
On 3/12/07, Andreas Krebbel <[EMAIL PROTECTED]> wrote: Hi, gcc currently doesn't boostrap on s390 and s390x: See http://gcc.gnu.org/ml/gcc-bugs/2007-03/msg00930.html Gr. Steven
Re: We're out of tree codes; now what?
On 3/12/07, David Edelsohn <[EMAIL PROTECTED]> wrote: I thought that the Tuples conversion was suppose to address this in the long term. The tuples conversion is only going to make things worse in the short term. Doug, isn't there a lang_tree bit you can make available, and use it to make the tree code field 9 bits wide? I know this is also not quite optimal, but adding 24 bits like this is an invitation to everyone to start using those bits, and before you know it we're stuck with a larger-than-necessary tree structure... :-( (plus, it's not 32 bits but 64 bits extra on 64 bits hosts...) Gr. Steven
Re: We're out of tree codes; now what?
On 3/12/07, Paolo Carlini <[EMAIL PROTECTED]> wrote: we are unavoidably adding tree codes and we must solve the issue, one way or another. Another real solution would perhaps be to not use 'tree' for front end specific data structures in C++, and instead just define g++ specific data structures to represent all the language details ;-) G++ needs 64 (!) language specific tree codes, almost 7 times more than any other front end, and in total more than twice as many as all other front ends (java, ada, and objc) together. IMHO, now all languages are going to suffer from a larger 'tree' and a slower compiler, because g++ basically abuses a shared data structure. Gr. Steven
Re: We're out of tree codes; now what?
On 3/12/07, Andrew Pinski <[EMAIL PROTECTED]> wrote: Can I recommend something just crazy, rewrite the C and C++ front-ends so they don't use the tree structure at all except when lowering until gimple like the rest of the GCC front-ends? The C front end already emits generic, so there's almost no win in rewriting it (one lame tree code in c-common.def -- not worth the effort ;-). Gr. Steven
Re: We're out of tree codes; now what?
On 3/12/07, Paolo Carlini <[EMAIL PROTECTED]> wrote: In my opinion, "visions" for a better future do not help here. No, I fully agree. I mean, imagine we'd have a long term plan for GCC. That would be so out of line! ;-) I'm not arguing against a practical solution. But to me at least it is just *so* frustrating to know that this issue was known literally years ago, and yet nothing has happened to avoid the situation before it could occur. Now we're looking at another set of hacks to "fix" the issue. And you know just as well as I do, that we're going to be stuck with those hacks forever, because nobody will have any motivation to fix the real problem for once. But oh well. SEP. Gr. Steven
Re: No ifcvt during ce1 pass (fails i386/ssefp-2.c)
On 3/15/07, Uros Bizjak <[EMAIL PROTECTED]> wrote: compile this with -O2 -msse2 -mfpmath=sse, and this testcase should compile to maxsd. I'll look into it this weekend. Gr. Steven
Re: No ifcvt during ce1 pass (fails i386/ssefp-2.c)
On 3/15/07, Uros Bizjak <[EMAIL PROTECTED]> wrote: BTW: Your patch also causes FAIL: gcc.dg/torture/pr25183.c -O0 (internal compiler error) FAIL: gcc.dg/torture/pr25183.c -O0 (test for excess errors) Yes. Known. I bootstrapped a fix and had a box test it yesterday. I'll look at the test results tonight and commit the fix if there are no new failures (and this one is fixed). This failure is caused by problems with dead jump tables. There's another bug (with a PR filed for it) that is also related to dead jump tables. The fix I have should fix both these cases. Gr. Steven
Re: No ifcvt during ce1 pass (fails i386/ssefp-2.c)
On 3/15/07, Uros Bizjak <[EMAIL PROTECTED]> wrote: The testcase is: double x; q() { x=x<5?5:x; } compile this with -O2 -msse2 -mfpmath=sse, and this testcase should compile to maxsd. This happens because a "fallthrough edge" is meaningless in cfglayout mode, but ifcvt.c still gives special meaning to the fallthrough edge. This should not matter, but it does for some reason, and I'm investigating this right now. I'll try to come up with a fix asap. Gr. Steven
Re: RFC: obsolete __builtin_apply?
On 3/16/07, Andrew Pinski <[EMAIL PROTECTED]> wrote: On 3/16/07, Steve Ellcey <[EMAIL PROTECTED]> wrote: > My thinking is that if libobjc was changed then we could put in a > depreciated message on these builtins for 4.3 and maybe remove them for > 4.4. libobjc has not changed yet. There was a patch a while back to change libobjc to use libffi but I need to go back to it and review it (as it was before I became a libobjc maintainer). Do you mean this patch: http://gcc.gnu.org/ml/gcc-patches/2004-12/msg00841.html ? Gr. Steven
Re: Building mainline and 4.2 on Debian/amd64
On 3/18/07, Florian Weimer <[EMAIL PROTECTED]> wrote: I don't need the 32-bit libraries, so disabling their compilation would be fine. --enable-targets at configure time might do the trick, but I don't know what arguments are accepted. Would --disable-multilib work? Gr. Steven
Re: We're out of tree codes; now what?
On 3/19/07, Doug Gregor <[EMAIL PROTECTED]> wrote: I went ahead and implemented this, to see what the real impact would be. The following patch frees up TREE_LANG_FLAG_5, and uses that extra bit for the tree code. On tramp3d, memory usage remains the same (obviously), and the performance results are not as bad as I had imagined: 8-bit tree code, --enable-checking: real1m56.776s user1m54.995s sys 0m0.541s 9-bit tree code, --enable-checking: real2m16.095s user2m12.132s sys 0m0.562s 8-bit tree code, --disable-checking: real0m55.693s user0m43.734s sys 0m0.414s 9-bit tree code, --disable-checking: real0m58.821s user0m46.122s sys 0m0.443s So, about 16% slower with --enable-checking, 5% slower with --disable-checking. Just because I'm curious and you have a built tree ready... Does the patch that Alex sent to gcc-patches the other day help reduce this 5% penalty? See the patch here: http://gcc.gnu.org/ml/gcc-patches/2007-03/msg01234.html There are other bitfield optimization related bugs (Roger Sayle should know more about those) that we can give a higher priority if we decide to go with the 9 bit tree code field. IMHO this is still the better solution than the subcodes idea. Gr. Steven
Re: We're out of tree codes; now what?
On 3/19/07, Doug Gregor <[EMAIL PROTECTED]> wrote: GCC has also been getting improved functionality, better optimizations, and better language support. Some of these improvements are going to cost us at compile time, because better optimizations can require more time, and today's languages require more work to compile and optimize than yesterday's. No, I don't want my compiler to be 5% slower, but I'll give up 5% for better standards conformance and improved code generation. Of course, the problem is not 5%, but the yet again 5%, on top of, I don't know, 200% since GCC 2.95.3?? Also, it is "better optimizations" for some purposes, but not for others. For example, many of the >140 passes are redundant for typical C code. It's not all bad news, either. Canonical types got us 3-5% speedup in the C++ front end (more on template-heavy code), so I figure I have at least a 3% speedup credit I can apply against the 9-bit code patch. That brings this patch under 2% net slow-down, so we should just put it in now :) But only for C++. I'm still in favor of the 9-bit code patch. But I think the slowdown should not be taken so lightly as you appear to do ;-) Gr. Steven
Re: We're out of tree codes; now what?
On 3/20/07, Mark Mitchell <[EMAIL PROTECTED]> wrote: As for the current problem, I think a 3% hit is pretty big. I didn't find subcodes that ugly, so I guess I'm inclined to go with subcodes, and avoid the hit. We know that one still mostly unaddressed problem that tree-ssa left us with, is poorer code for bitfield operations. That means the 3% can probably be reduced further. Another thing I like about the 9-bit tree code approach is that we keep the size of the 'tree' data structure the same, so there is no effect on memory. I think that 3% is unfortunate but worth it because the impact on the structure of the compiler is negligible, while subcodes require significant rewrites of some parts of gcc. Let's be fair here: A 3% hit is small compared to the cumulative slowdown we already have in GCC 4.3 since the start of stage 1, and negligible compared to the total slowdown we've accumulated over the years. I know this is not really an argument, but let's face it: Much larger patches and branch merges have unintentionally increased compile time by more than 3%, and we didn't have a large discussion about it. Those were the power plants, and Doug's patch is the (you've guessed it!) bikeshed! ;-) Back to the technical arguments... Subcodes require a bigger 'tree' data structure so there will be a memory usage hit, I don't think there's disagreement about that. We don't know if subcodes will have no compiler speed hit. At least, I don't recall seeing any numbers yet. But if 'tree' is bigger, the chances are that we'll see poorer cache behavior, and therefore a slower compiler. So the subcodes approach may end up no better than the 9-bit tree code approach wrt. compiler speed. (Of course, for a good technical decision, you'd have to try both approaches and do a fair comparison.) I also think subcudes are bug prone, because you have more cases to handle and people are unfamiliar with this new structure. The impact of subcodes on the existing code bases is just too large for my taste. I think it's fair for front ends to pay for their largesse. There are also relatively cheap changes in the C++ front end to salvage a few codes, and postpone the day of reckoning. I think that day of reckoning will come very soon again, with more C++0x work, more autovect work, OpenMP 3.0, and the tuples and LTO projects, etc., all requiring more tree codes. And if there comes a point somewhen, where we can go back to a smaller tree code field, it is much easier to do so with the 9-bit tree code approach, than with subcodes. Gr. Steven
Re: We're out of tree codes; now what?
On 3/20/07, Doug Gregor <[EMAIL PROTECTED]> wrote: > So the memory hit shouldn't be > as big as e.g. going to 16 bit tree codes if that means increasing the size > of most of the trees the compiler uses. Yes, this is true. But this could be solved if all LANG_TREE_x bits could move to language specific trees, could it? Gr. Steven
Re: We're out of tree codes; now what?
On 3/22/07, Joe Buck <[EMAIL PROTECTED]> wrote: But these numbers show that subcodes don't cost *ANY* time, or the cost is in the noise, unless enable-checking is on. The difference in real-time seems to be an artifact, since the user and sys times are basically the same. The subcodes cost complexity. And the cost with checking enabled is IMHO unacceptable. Gr. Steven
Re: We're out of tree codes; now what?
On 3/22/07, Doug Gregor <[EMAIL PROTECTED]> wrote: The results, compile time: For what test case? For a bootstrapped, --disable-checking compiler: 8-bit tree code (baseline): real0m51.987s user0m41.283s sys 0m0.420s subcodes (this patch): real0m53.168s user0m41.297s sys 0m0.432s 9-bit tree code (alternative): real0m56.409s user0m43.942s sys 0m0.429s Did the 9-bit tree code include Alexandre Oliva's latest bitfield optimization improvements patch (http://gcc.gnu.org/ml/gcc-patches/2007-03/msg01397.html)? What about the 16-bit tree code? Gr. Steven
Re: We're out of tree codes; now what?
On 3/22/07, Mike Stump <[EMAIL PROTECTED]> wrote: is more obvious than the correctness of the subcoding. Thoughts? I fully agree. Gr. Steven