Re: Ada front-end depends on signed overflow
> > -ftrapv is not practically usable because (1) it generates awful code and > > (2) it badly interacts with the RTL optimizers. > > please before you say this compare it with the truly awful front end > code we generate, which for sure inteferes badly with the optimizers. Right, the code generated by the front end is not pretty either, but at least the front-end knows what it is doing Ada-wise. -ftrapv is so dumb at the moment that it emits checks for virtually anything. As for the interaction with the optimizers: int foo(int a, int b) { return a + b; } gcc -S -O -ftrapv on SPARC: foo: jmp %o7+8 add%o0, %o1, %o0 .size foo, .-foo .ident "GCC: (GNU) 4.1.0 20050604 (experimental)" And I have a slightly more contrived example with the same problem at -O0! I think we cannot use -ftrapv alone for Ada because it is too low-level. Its general mechanism certainly can help (once it is fixed) but it must be driven by something more Ada-aware. -- Eric Botcazou
Re: Can't bootstrap current gcc cvs trunk on sparc-linux: SIGSEGV: build/genattrtab /usr/local/src/trunk/gcc/gcc/config/sparc/sparc.md > tmp-attrtab.c
This particular problem is gone now -- Cheers, /ChJ
Re: Edges, predictions, and GC crashes ...
Jan Hubicka wrote: > I've comitted the attached patch. I didn't suceed to reproduce your > failures, but Danny reported it fixes his and it bootstrap/regtests > i686-pc-gnu-linux. Thanks; this does fix one crash on s390x, but doesn't fix the pass57-frag crashes on s390. What happens is that after the predictions are created, but before remove_edge is called, the edge is modified in rtl_split_block (called from tree_expand_cfg): /* Redirect the outgoing edges. */ new_bb->succs = bb->succs; bb->succs = NULL; FOR_EACH_EDGE (e, ei, new_bb->succs) e->src = new_bb; Now the 'src' link points to a different basic block, but the old basic block still has the prediction pointing to the edge. When remove_edge is finally called, your new code tries to find and remove the prediction from the *new* basic block's prediction list -- but it still remains on the old one's list ... Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: Can't bootstrap current gcc cvs trunk on sparc-linux: SIGSEGV: build/genattrtab /usr/local/src/trunk/gcc/gcc/config/sparc/sparc.md > tmp-attrtab.c
> This particular problem is gone now Right, works fine on Solaris too. -- Eric Botcazou
Re: Edges, predictions, and GC crashes ...
> Jan Hubicka wrote: > > > I've comitted the attached patch. I didn't suceed to reproduce your > > failures, but Danny reported it fixes his and it bootstrap/regtests > > i686-pc-gnu-linux. > > Thanks; this does fix one crash on s390x, but doesn't fix the > pass57-frag crashes on s390. > > What happens is that after the predictions are created, but before > remove_edge is called, the edge is modified in rtl_split_block > (called from tree_expand_cfg): > > /* Redirect the outgoing edges. */ > new_bb->succs = bb->succs; > bb->succs = NULL; > FOR_EACH_EDGE (e, ei, new_bb->succs) > e->src = new_bb; > > Now the 'src' link points to a different basic block, but the old > basic block still has the prediction pointing to the edge. > > When remove_edge is finally called, your new code tries to find > and remove the prediction from the *new* basic block's prediction > list -- but it still remains on the old one's list ... Uhm, I will test fix for this too. Thanks! Honza > > Bye, > Ulrich > > -- > Dr. Ulrich Weigand > Linux on zSeries Development > [EMAIL PROTECTED]
RFC: Strategy for cc0 -> CCmode conversion for the AVR target.
Hi, During the last weeks I have experimented a bit with the AVR back-end. IMO there presently are two areas where it is worth to concentrating efforts on: 1.) cc0 -> CCmode transition 2.) splitting of HI- and SI-mode operations so that the RTL finally gets some similarity with the actually existing AVR instructions. I'd like to discuss with you on how to address these issues best because I think it's better to have a good plan of what to do before starting programing. Concerning 1.) Ian Lance Taylor has made a couple of suggestions on how to make the transition easier for the back-end maintainers. So it seems that there is already some activity around. For issue 2.) Richard Henderson has recently posted a patch that implements a "subreg-lowering" pass run directly after expand that gives the register allocator the freedom to handle and allocate the individual subreg expressions individually (could give some tremendous improvements for AVR, e.g. when dealing with expressions combining expressions with different modes). IMO the two issues 1.) and 2.) are somewhat linked since, IMO it would be a good idea to implement a cc0->CCmode transition method that does not make it difficult to use Richard Henderson's patch later-on. 1.) Concerning the cc0 -> CCmode transition, IIUC, there are two main problems: A) We must make sure the reload does not insert condition-code (CC) clobbering instructions in between a CC setter and a CC user (i.e. conditional branches). B) We must find a way to teach GCC to re-use as frequently as possible the condition codes that are set by arithmetic and logic operations. IMO, the easiest and most straight-forward solution of A) for the avr target would be to follow Ian's suggestion: > Ian Lance Taylor wrote: > >> Is there a way that makes it possible that only reload uses the patterns >> that save and restore the condition code while everywhere else the usual >> patterns are generated? >> In case of the AVR target, most of the time reload will be >> able to use processor instructions that do not require the save/restore >> operations (it could freely access addresses up to a constant offset of 64 >> from the stack pointer) and the costly operations would not show up very >> frequently (only for large offsets). > >You could do it by writing your insn patterns as expanders which check >reload_in_progress || reload_completed and under those circumstances >emit insns which do not clobber the condition codes. > >Ian I would expect that in the rare case that reload needs to access data beyond the 64-Byte boundary it could be tolerated to expand the memory access to a sequence of type "(set (temp_register) (SR)) (memory_move_instruction_instruction_inserted_by_reload) (set (SR) (temp_register))" One would have to confirm that, but I assume that the remaining passes after reload would be sufficiently smart to optimize away the two copy operations for saving and restoring the status register in case that they are not necessary.? E.g. I think that it is justified to assume that if above reload-generated memory access is followed by an instruction that clobbers the condition code, both of the status register operations that are embracing the memory move would be deleted. Possibly these two instructions would also be deleted, if the memory move instruction does not clobber the condition code. Doing this, I think that one could resolve the most difficult issue A) of cc0->CCmode conversion without having to face what has been called "the combinatorical pattern explosion" in the mailing list archives. Concerning issue B) Ian Lance Taylor suggests, IIUC, to implement a new pass after reload: After reload, it is known which alternative of the different instructions that possibly set CC has been chosen and one could find out whether some of the compare instructions could be deleted. IMO, if this new pass is available by the time the cc0->CCmode transition is implemented for AVR, one should probably try to use it. Meanwhile I'd like to suggest an approach that tries to remove the unnecessary compare instructions already before reload. I.e. what I am thinking about is to try to use the (G)CSE passes to identify situations where arithmetic instructions calculate the same condition code that is again calculated later-on by compare or test instructions. Disadvantage would be that at this time one would not be able to know which alternative of an instruction will be chosen by reload. In my present opinion, however, this is an issue that is not a very big problem for AVR. Also, IMO, one should probably try hard to identify the "single-bit-test-and-branch" patterns before reload. The condition-code re-use issue is the point, where, IMO, the link to the subreg-lowering 2.) shows up. After, e.g., breaking down a HI mode "sub" operation into two QI mode "sub" and "sub-with-carry"s at expand, I consider it to be extremely difficult to make t
Re: RFC: Strategy for cc0 -> CCmode conversion for the AVR target.
The condition-code re-use issue is the point, where, IMO, the link to the subreg-lowering 2.) shows up. After, e.g., breaking down a HI mode "sub" operation into two QI mode "sub" and "sub-with-carry"s at expand, I consider it to be extremely difficult to make the mid-end smart enough to identify that a the end of the QI "sub-with-carry" the condition code is set according to the corresponding HImode substract operation. As a last resort, ou can always use peephole2's to remove unnecessary subtracts on the HImode values. (parallel [ (use (operands[0])) (set (operands[0]) (minus:HI (operands[1]) (operands[2])) (note "please delete the entire embracing parallel instruction before register life-time analysis by a new pass: It pretends to use operands 1 and 2 while in fact this instruction does nothing except from giving hints to GCSE.") ]) This seems define_insn_and_split, but it is a lot more complex than what you probably can do... IIRC, s390 does use add with carry and subtract with borrow instructions effectively (alc and slb in IBM360^W s390-ese). Search the archives on google or gmane. - combine -re-run No way, combine is too expensive... Its simplification engine is fine, but a lot of things ought to be redone from scratch so that it becomes a serious instruction selection pass. Paolo
CVS locked?
I'm getting the following error messages when trying to run a diff from mainline: $ cvs diff Makefile.am libgfortran.h runtime/in_pack_generic.c runtime/in_unpack_generic.c m4/in_pack.m4 m4/in_unpack.m4 > ~/Krempel/In-pack/in_pack.patch cvs diff: [13:47:28] waiting for hubicka's lock in /cvs/gcc/gcc/libgfortran cvs diff: [13:47:58] waiting for hubicka's lock in /cvs/gcc/gcc/libgfortran cvs diff: [13:48:28] waiting for hubicka's lock in /cvs/gcc/gcc/libgfortran Is this supposed to happen?
Re: CVS locked?
On Saturday 04 June 2005 15:50, Thomas Koenig wrote: > I'm getting the following error messages when trying to run > a diff from mainline: > > $ cvs diff Makefile.am libgfortran.h runtime/in_pack_generic.c > runtime/in_unpack_generic.c m4/in_pack.m4 m4/in_unpack.m4 > > ~/Krempel/In-pack/in_pack.patch cvs diff: [13:47:28] waiting for hubicka's > lock in /cvs/gcc/gcc/libgfortran cvs diff: [13:47:58] waiting for hubicka's > lock in /cvs/gcc/gcc/libgfortran cvs diff: [13:48:28] waiting for hubicka's > lock in /cvs/gcc/gcc/libgfortran > > Is this supposed to happen? It happens, iiuc, when people merge branches as a whole instead of piece-wise. Gr. Steven
Re: RFC: Strategy for cc0 -> CCmode conversion for the AVR target.
Am Samstag, 4. Juni 2005 15:04 schrieb Paolo Bonzini: > > (parallel [ > > (use (operands[0])) > > (set (operands[0]) (minus:HI (operands[1]) (operands[2])) > > (note "please delete the entire embracing parallel instruction before > > register life-time analysis by a new pass: It pretends to use operands 1 > > and 2 while in fact this instruction does nothing except from giving > > hints to GCSE.") > > ]) > > This seems define_insn_and_split, but it is a lot more complex than what > you probably can do... I already have confirmed that GCSE is smart enough to deal with such kind of expressions. It effectively ignores the (use) operand when searching for common expressions and I know that it *will* optimize away a later instruction that has the same "set" statement: I have seen that it works when experimenting with a sub-optimal divmodsi4 expand pattern that has been supplemented by such type of parallel [use) (set)] instruction. Concerning the seeming similarity with "define_insn_and_split": The *huge* benefit of subreg lowering at expand in comparison to define_insn_and_split is that all of the power of the optimizers before reload can work on the resulting instruction sequences. E.g. IMO there is no way to implement " uint8_t a; // automatic variable in registers int8_t b; // automatic variable in registers int16_t c; // variable in static memory c = a | (b << 8); " efficiently for AVR when splitting after expand: Gcc before reload will first allocate four additional 8-Bit registers: It will store the sign-extended b in to two of them and the zero-extended a in the other two new registers. For doing this it will insert instruction sequences for calculating the sign-extension for b. It will insert instruction sequences for the zero-extension of a. It will excecute a shift instruction for the resulting 16 bit-value of the sign-extended b. Afterwards will come a 16 bit "ior" instruction of the two new 16-bit values and finally, after all this unnecessary work it will emit two QImode memory moves for the two bytes of the 16 bit-result. With appropriately use of Richard Henderson's patch, all that comes out after the optimizers before reload is "two QImode moves to memory". The optimizer passes after reload are not smart enough to identify the optimization opportunities. Also it would be too late to take profit of the four registers that have been allocated without actually needing them. > IIRC, s390 does use add with carry and subtract with borrow instructions > effectively (alc and slb in IBM360^W s390-ese). Search the archives on > google or gmane. Thank's for the hint. I'll have a look at the 360 port. > > - combine -re-run > > No way, combine is too expensive... Its simplification engine is fine, > but a lot of things ought to be redone from scratch so that it becomes a > serious instruction selection pass. I agree with you that this might not be a good choice for targets like x86 and I am not suggesting to include this option in the lists of passes run with any of the default options like -O0 and -O3. Maybe it could be included with "expensive-optimizations". Concerning compile times, one should keep in mind that a typical AVR target disposes of 8k - 64k program memory only. IMO, compile time degradation on the host machine would be readily accepted by all of the AVR users if it improves code. Build time for my entire projects amount to roughly 30 seconds, most of which used for checking dependencies, linking and report file generation. In my personal opinion: Everyone using the AVR port would be happy even if degradation would amount to a factor of 10! Yours, Björn
A renewed look at GCC's performance from rebuild_jump_labels's view
Hi, Kazu reported on the performance of tree-ssa from rebuild_jump_labels's view in March last year: http://gcc.gnu.org/ml/gcc/2004-03/msg00659.html. I decided to redo this little experiment on my collection of cc1-i files. My results for today's CVS head at -O2 on amd64: mainline today --- calls total cse1 820 11686 cse2 24 11686 gcse8705 11686 bypass 643 11686 combine 15 11686 "calls" is the number of calls to rebuild_jump_labels() right after CSE1 (or CSE2, GCSE, BYPASS, or COMBINE). "total" is the number of calls to the pass itself, i.e. the number of functions that the RTL optimizers got to see. GCSE is this high because gcse_main unconditionally returns 1 if it runs anything (i.e. when !is_too_expensive) which is probably a bug. This number does _not_ include the extra rebuild_jump_labels() call after the re-runs of cse_main in rest_of_handle_gcse (I just used Kazu's patch and he didn't take those into account either). Apparently I have a larger set of .i files than Kazu had. But still CSS2 almost never has to rebuild_jump_labels, and neither does combine. I am not sure why the number for CSE1 is this high. But 820 out of 11686 is still only 7% of the times that CSE1 needs to rebuild labels. Gr. Steven
gcc-4.1-20050604 is now available
Snapshot gcc-4.1-20050604 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20050604/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.1 CVS branch with the following options: -D2005-06-04 17:43 UTC You'll find: gcc-4.1-20050604.tar.bz2 Complete GCC (includes all of below) gcc-core-4.1-20050604.tar.bz2 C front end and core compiler gcc-ada-4.1-20050604.tar.bz2 Ada front end and runtime gcc-fortran-4.1-20050604.tar.bz2 Fortran front end and runtime gcc-g++-4.1-20050604.tar.bz2 C++ front end and runtime gcc-java-4.1-20050604.tar.bz2 Java front end and runtime gcc-objc-4.1-20050604.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.1-20050604.tar.bz2The GCC testsuite Diffs from 4.1-20050528 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.1 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Successfully built gcc 3.4.4 on PII 450 running GNU/Linux Slackware 10.1
./config.guess i686-pc-linux-gnu gcc -v Reading specs from /usr/lib/gcc/i686-pc-linux-gnu/3.4.4/specs Configured with: ../configure --with-cpu=pentium2 --with-march=pentium2 --with-mtune=pentium2 --prefix=/usr --enable-shared --enable-threads Thread model: posix gcc version 3.4.4 Slackware 10.1 Linux quasar 2.6.7 #5 Sat Apr 23 20:53:05 CEST 2005 i686 unknown unknown GNU/Linux glibc 2.3.4 Thank you for your excellent work!!! Bye. Gabriele Inghirami
Re: sizeof(int) in testsuite
Am Freitag, 3. Juni 2005 10:48 schrieb Mark Mitchell: > DJ Delorie wrote: > > Do we have a standard way of telling the testsuite how big target > > types are, or some standard "this test assumes 32 bit int" dejagnu > > flag? > > I don't think we have any way of doing this at present. I could be > wrong, though. We could certainly add logic to compute this, using > techniques similar to those in target-supports.exp. I had been posting a patch implementing an effective target keyword exactly for this purpose. My patch has already been approved by Janis Johnson (http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01922.html) but it's not in the CVS tree now. Possibly, I should have asked somebody with cvs access to commit the patch. Or is there a special procedure to follow for having approved patches committed? Yours, Björn
Re: RFC: Strategy for cc0 -> CCmode conversion for the AVR target.
Björn Haase <[EMAIL PROTECTED]> writes: > Concerning 1.) Ian Lance Taylor has made a couple of suggestions on how to > make the transition easier for the back-end maintainers. So it seems that > there is already some activity around. In fact Hans-Peter Nilsson is implementing code to support this transition. I think he started following my suggestions, but has of course modified them in the course of actual implementation. > I would expect that in the rare case that reload needs to access data beyond > the 64-Byte boundary it could be tolerated to expand the memory access to a > sequence of type > "(set (temp_register) (SR)) > (memory_move_instruction_instruction_inserted_by_reload) > (set (SR) (temp_register))" > > One would have to confirm that, but I assume that the remaining passes after > reload would be sufficiently smart to optimize away the two copy operations > for saving and restoring the status register in case that they are not > necessary.? E.g. I think that it is justified to assume that if above > reload-generated memory access is followed by an instruction that clobbers > the condition code, both of the status register operations that are embracing > the memory move would be deleted. Possibly these two instructions would also > be deleted, if the memory move instruction does not clobber the condition > code. Assuming the memory move instruction does not clobber the SR, then I would expect the reload CSE pass to remove the (set (SR) ...). Presumably the temporary register would then be dead, in which case the flow2 pass should remove the (set (temp_register) ...). > The condition-code re-use issue is the point, where, IMO, the link to the > subreg-lowering 2.) shows up. After, e.g., breaking down a HI mode "sub" > operation into two QI mode "sub" and "sub-with-carry"s at expand, I consider > it to be extremely difficult to make the mid-end smart enough to identify > that a the end of the QI "sub-with-carry" the condition code is set according > to the corresponding HImode substract operation. For DImode operations the > mid-end would already need to take 8 (!) Instructions into account for > finding out what the calculated condition code actually represents. This, > also, will be a major difficulty when considering Ian's suggested optimizer > pass after reload. I agree that there is a problem here, but it's not clear to me how you can address it under the current cc0 scheme either. > ; Additional "Marker" instructions to be used by GCSE > (parallel [ > (use (reg:CC_cc SR)) > (set (reg:CC_cc SR) (compare:HI (operands[1]) (operands[2])) > (note "please delete the entire embracing parallel instruction before > register life-time analysis by a new pass: It pretends to use operands 1 and > 2 while in fact this instruction does nothing except from giving hints to > GCSE.") > ]) > (parallel [ > (use (reg:CC_cc SR)) > (set (reg:CC_cc SR) (compare:HI (operands[0]) (const_0)) > (note "please delete the entire embracing parallel instruction before > register life-time analysis by a new pass: It pretends to use operand[0] > while in fact this instruction does nothing except from giving hints to > GCSE.") > ]) > (parallel [ > (use (operands[0])) > (set (operands[0]) (minus:HI (operands[1]) (operands[2])) > (note "please delete the entire embracing parallel instruction before > register life-time analysis by a new pass: It pretends to use operands 1 and > 2 while in fact this instruction does nothing except from giving hints to > GCSE.") > ]) I'm not crazy about these marker instructions personally. They are describing something which I think could be handled via parallel sets or register notes. The more serious problem I see is that if part of the subtraction disappears for some reason, the information, however stored, will be incorrect. How can that problem be avoided? Ian