Memory usage of 4.2 versus 4.3 (at branchpoints)
Hi, to give some perspective to the discussion on memory usage, I generated comparsion of 4.2 branchpoint to 4.3 branchpoint from logs of our memory tester. I would say it is quite pleasing to see that 4.3 is not really regression relative 4.2 in most tests like it was custom in previous releases, but still we ought to do a lot better ;) There is posssibly interesting 35% regression at -O1 combine.c... Honza comparing combine.c compilation at -O0 level: Peak amount of GGC memory allocated before garbage collecting run decreased from 9595k to 8929k, overall -7.46% Peak amount of GGC memory still allocated after garbage collecting decreased from 8942k to 8558k, overall -4.49% Amount of produced GGC garbage decreased from 40099k to 34878k, overall -14.97% Amount of memory still referenced at the end of compilation decreased from 6705k to 6073k, overall -10.41% Overall memory needed: 24905k -> 24797k Peak memory use before GGC: 9595k -> 8929k Peak memory use after GGC: 8942k -> 8558k Maximum of released memory in single GGC run: 2737k -> 2576k Garbage: 40099k -> 34878k Leak: 6705k -> 6073k Overhead: 5788k -> 4715k GGC runs: 317 -> 294 comparing combine.c compilation at -O1 level: Overall memory allocated via mmap and sbrk increased from 26820k to 36237k, overall 35.11% Amount of produced GGC garbage decreased from 60618k to 55748k, overall -8.74% Amount of memory still referenced at the end of compilation decreased from 6888k to 6151k, overall -11.98% Overall memory needed: 26820k -> 36237k Peak memory use before GGC: 17364k -> 16999k Peak memory use after GGC: 17180k -> 16830k Maximum of released memory in single GGC run: 2372k -> 2342k Garbage: 60618k -> 55748k Leak: 6888k -> 6151k Overhead: 7578k -> 6045k GGC runs: 387 -> 369 comparing combine.c compilation at -O2 level: Amount of memory still referenced at the end of compilation decreased from 6973k to 6252k, overall -11.53% Overall memory needed: 26820k -> 26496k Peak memory use before GGC: 17367k -> 16999k Peak memory use after GGC: 17180k -> 16830k Maximum of released memory in single GGC run: 2452k -> 2884k Garbage: 77388k -> 76521k Leak: 6973k -> 6252k Overhead: 10022k -> 8785k GGC runs: 456 -> 443 comparing combine.c compilation at -O3 level: Ovarall memory allocated via mmap and sbrk decreased from 26820k to 25596k, overall -4.78% Amount of memory still referenced at the end of compilation decreased from 7030k to 6317k, overall -11.28% Overall memory needed: 26820k -> 25596k Peak memory use before GGC: 18365k -> 17988k Peak memory use after GGC: 17995k -> 17536k Maximum of released memory in single GGC run: 3510k -> 4130k Garbage: 107793k -> 107354k Leak: 7030k -> 6317k Overhead: 13563k -> 12408k GGC runs: 509 -> 490 comparing insn-attrtab.c compilation at -O0 level: Overall memory allocated via mmap and sbrk increased from 80924k to 83700k, overall 3.43% Amount of produced GGC garbage decreased from 146623k to 125964k, overall -16.40% Amount of memory still referenced at the end of compilation decreased from 9856k to 9117k, overall -8.11% Overall memory needed: 80924k -> 83700k Peak memory use before GGC: 69469k -> 68247k Peak memory use after GGC: 45007k -> 43913k Maximum of released memory in single GGC run: 36247k -> 35708k Garbage: 146623k -> 125964k Leak: 9856k -> 9117k Overhead: 19791k -> 16830k GGC runs: 252 -> 231 comparing insn-attrtab.c compilation at -O1 level: Overall memory allocated via mmap and sbrk increased from 111696k to 118444k, overall 6.04% Peak amount of GGC memory allocated before garbage collecting increased from 94037k to 94551k, overall 0.55% Peak amount of GGC memory still allocated after garbage collectin increased from 83553k to 90403k, overall 8.20% Amount of memory still referenced at the end of compilation decreased from 10072k to 8977k, overall -12.20% Overall memory needed: 111696k -> 118444k Peak memory use before GGC: 94037k -> 94551k Peak memory use after GGC: 83553k -> 90403k Maximum of released memory in single GGC run: 32589k -> 31807k Garbage: 289765k -> 289427k Leak: 10072k -> 8977k Overhead: 36663k -> 29408k GGC runs: 245 -> 240 comparing insn-attrtab.c compilation at -O2 level: Ovarall memory allocated via mmap and sbrk decreased from 127120k to 114404k, overall -11.11% Peak amount of GGC memory allocated before garbage collecting run decreased from 113347k to 95237k, overall -19.02% Peak amount of GGC memory still allocated after garbage collectin increased from 83466k to 90625k, overall 8.58% Amount of produced GGC garbage decreased from 372181k to 328157k, overall -13.42% Amount of memory still referenced at the end of compilation decreased from 10176k to 8982k, overall -13.30% Overall memory needed: 127120k -> 114404k Peak memory
Question about LTO dwarf reader vs. artificial variables and formal arguments
Hello, I want to make gfortran produce better debug information, but I want to do it in a way that doesn't make it hard/impossible to read back in sufficient information for LTO to work for gfortran. I haven't really been following the whole LTO thing much, but if I understand correctly, the goal is to reconstruct information about declarations from DWARF information that we write out for those declarations. If that's the case, I wonder how LTO will handle artificial "variables" and formal argument lists. For example, gfortran adds additional formal arguments for functions that take a CHARACTER string as a formal argument, e.g. program test implicit none call sub("Hi World!") contains subroutine sub(c) character*10 c end subroutine end produces as a GIMPLE dump: MAIN__ () { static void sub (char[1:10] &, int4); _gfortran_set_std (70, 127, 0); sub ("Hi World!", 9); } sub (c, _c) { (void) 0; } where _c is strlen("Hi World!"). From a user perspective, it would be better to hide _c for the debugger because it is not something that the user had in the original program. I have a patch to hide that parameter, that is, it stops GCC from writing out DW_TAG_formal_parameter for _c. But I am worried about how this will work out later if/when someone tries to make LTO work for gfortran too. Can you still reconstruct the correct function prototype for LTO from the debug info if you don't write debug info for _c? Similarly, LTO has to somehow deal with DECL_VALUE_EXPR and the debug information that is produced from it. Gfortran (and iiuc other front ends and SRA) use this DECL_VALUE_EXPR to produce fake variables that point to some location to improve the debug experience of the user. For Fortran we use it to create fake variables to point at members of a COMMON block, for example, so that the user can do "p A" for a variable A in a common block, instead of "p name_of_the_common_block.A". Is there already some provision to handle this kind of trickery in LTO? Finally, consider another Fortran example: program debug_array_dimensions implicit none integer i(10,10) i(2,9) = 1 end Gfortran currently produces the following wrong debug information for this example: <2><94>: Abbrev Number: 3 (DW_TAG_variable) DW_AT_name: i DW_AT_decl_file : 1 DW_AT_decl_line : 1 DW_AT_type: DW_AT_location: 3 byte block: 91 e0 7c (DW_OP_fbreg: -416) <1>: Abbrev Number: 4 (DW_TAG_array_type) DW_AT_type: DW_AT_sibling : <2>: Abbrev Number: 5 (DW_TAG_subrange_type) DW_AT_type: DW_AT_lower_bound : 0 DW_AT_upper_bound : 99 <1>: Abbrev Number: 6 (DW_TAG_base_type) DW_AT_byte_size : 8 DW_AT_encoding: 5 (signed) DW_AT_name: int8 <1>: Abbrev Number: 6 (DW_TAG_base_type) DW_AT_byte_size : 4 DW_AT_encoding: 5 (signed) DW_AT_name: int4 Note the sinlge DW_TAG_subrange_type <0, 99> for the type of "i", instead of two times DW_TAG_subrange_type <1, 10> instead. This happens because in gfortran all arrays are flattened (iirc to make code generation easier). I would like to make gfortran write out the correct debug information, e.g. something with <2>: Abbrev Number: 5 (DW_TAG_subrange_type) DW_AT_type: DW_AT_upper_bound : 10 <2>: Abbrev Number: 5 (DW_TAG_subrange_type) DW_AT_type: DW_AT_upper_bound : 10 but what would happen if LTO reads this in and re-constructs the type of "i" from this information? I imagine it would lead to mis-matches of the GIMPLE code that you read in, where "i" is a 1x100 array, and the re-constructed variable "i" which would be a 10x10 2D array. Has anyone working on LTO already thought of these challanges? I'm all new to both DWARF and LTO, so forgive me if my rant doesn't make sense ;-) Gr. Steven
Re: Question about LTO dwarf reader vs. artificial variables and formal arguments
On Sat, Oct 21, 2006 at 06:35:40PM +0200, Steven Bosscher wrote: > where _c is strlen("Hi World!"). From a user perspective, it would be better > to hide _c for the debugger because it is not something that the user had in > the original program. I have a patch to hide that parameter, that is, it > stops GCC from writing out DW_TAG_formal_parameter for _c. But I am worried > about how this will work out later if/when someone tries to make LTO work for > gfortran too. > Can you still reconstruct the correct function prototype for LTO from the > debug info if you don't write debug info for _c? Wouldn't this be upsetting to debuggers, too - for instance, if they wanted to call the function? It might be wiser to tag it DW_AT_artificial, and let the debugger sort out what to do with it. > Similarly, LTO has to somehow deal with DECL_VALUE_EXPR and the debug > information that is produced from it. Gfortran (and iiuc other front ends > and SRA) use this DECL_VALUE_EXPR to produce fake variables that point to > some location to improve the debug experience of the user. For Fortran we > use it to create fake variables to point at members of a COMMON block, for > example, so that the user can do "p A" for a variable A in a common block, > instead of "p name_of_the_common_block.A". Is there already some provision > to handle this kind of trickery in LTO? I don't think we're far enough on yet to know the answer to this or your other question, but I may be wrong. There's a reason we're focusing on C right now :-) I don't think the design precludes this sort of thing, but we won't know how it all fits together until more's been done. -- Daniel Jacobowitz CodeSourcery
Re: Question about LTO dwarf reader vs. artificial variables and formal arguments
Steven Bosscher <[EMAIL PROTECTED]> writes: > I haven't really been following the whole LTO thing much, but if I understand > correctly, the goal is to reconstruct information about declarations from > DWARF information that we write out for those declarations. If that's the > case, I wonder how LTO will handle artificial "variables" and formal argument > lists. I think it is a mistake to focus on DWARF details too much. We simply need some mechanism to write trees into an object file and to read them back in. That mechanism can be anything. We are using DWARF on the theory that it will be simpler because DWARF readers and writers already exist (I don't buy that argument myself, but, whatever). But it is clearly impossible to represent everything we need to represent in DWARF. So we need to extend DWARF as necessary to represent all the tree details. That is, we are not going to write out DWARF. We can't, because DWARF is not designed to represent all the details which the compiler needs to represent. What we are going to write out is a superset of DWARF. And in fact, if it helps, I think that we shouldn't hesitate to write out something which is similar to but incompatible with DWARF. In general reading and writing trees is far from the hardest part of the LTO effort. I think it is a mistake for us to get too tied up in the details of how to represent things in DWARF. (I also think that we could probably do better by defining our own bytecode language, one optimized for our purposes, but it's not an issue worth fighting over.) Ian
Re: Question about LTO dwarf reader vs. artificial variables and formal arguments
Ian Lance Taylor wrote on 10/21/06 14:59: That is, we are not going to write out DWARF. We can't, because DWARF is not designed to represent all the details which the compiler needs to represent. What we are going to write out is a superset of DWARF. And in fact, if it helps, I think that we shouldn't hesitate to write out something which is similar to but incompatible with DWARF. In general reading and writing trees is far from the hardest part of the LTO effort. I think it is a mistake for us to get too tied up in the details of how to represent things in DWARF. (I also think that we could probably do better by defining our own bytecode language, one optimized for our purposes, but it's not an issue worth fighting over.) Agreed. I don't think we'll get far if we focus too much on DWARF, as it clearly cannot be used as a bytecode language for our purposes. We will need to evolve our own bytecode language, either as an extension to DWARF (much like we did with SIMPLE) or do something from scratch. Implementing type support starting from DWARF is a start, but we should not constrain ourselves to it.
fwhole-program, -combine, several C sources ?
Dear All, For C source programs only, is there a scenario where several (eg two or more) C source files (ie passed *.c arguments) can be passed with the -fwhole-program flag and without the -combine flag? In other words, some medium sized (eg < 100 KLOC of C source in several *.c files) programs can be compiled with gcc -O3 -fwhole-program -combine *.c -o prog ...libraries and includes but the compilation fails if the -combine flag is not given. Maybe the driver might issue some warning if -fwhole-program is given without -combine for several C source inputs? Or probably, I am not very well understanding the -fwhole-program flag By the way, I am surprised that so few programs are compiled with it... http://www.google.com/codesearch?q=fwhole-program give few results Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet aliases: basiletunesorg = bstarynknerimnet 8, rue de la Faïencerie, 92340 Bourg La Reine, France
Re: fwhole-program, -combine, several C sources ?
On Sat, 2006-10-21 at 22:03 +0200, Basile STARYNKEVITCH wrote: > Dear All, > > For C source programs only, is there a scenario where several (eg two or > more) C source files (ie passed *.c arguments) can be passed with the > -fwhole-program flag and without the -combine flag? You can have a whole program in one source module so the warning would get in the way of that case. Plus -combine is useless in that case also. In fact -combine will go away once LTO finishes. Thanks, Andrew Pinski
Re: Question about LTO dwarf reader vs. artificial variables and formal arguments
Diego Novillo wrote: Ian Lance Taylor wrote on 10/21/06 14:59: That is, we are not going to write out DWARF. We can't, because DWARF is not designed to represent all the details which the compiler needs to represent. What we are going to write out is a superset of DWARF. And in fact, if it helps, I think that we shouldn't hesitate to write out something which is similar to but incompatible with DWARF. In general reading and writing trees is far from the hardest part of the LTO effort. I think it is a mistake for us to get too tied up in the details of how to represent things in DWARF. (I also think that we could probably do better by defining our own bytecode language, one optimized for our purposes, but it's not an issue worth fighting over.) Agreed. I don't think we'll get far if we focus too much on DWARF, as it clearly cannot be used as a bytecode language for our purposes. I think the bytecode issue is a red herring, because we are no longer talking about using DWARF for the bodies of functions. DWARF is only being used for declarations and types. There, yes, we will need some extensions to represent things. However, DWARF is designed to be extended, so that's no problem. I continue to think think that using DWARF (with extensions) since it makes this information accessible to other tools (including GDB). I think that before there ought to be a compelling reason to abandon a strategy based on DWARF. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: Question about LTO dwarf reader vs. artificial variables and formal arguments
Steven Bosscher wrote: contains subroutine sub(c) character*10 c end subroutine end produces as a GIMPLE dump: sub (c, _c) { (void) 0; } where _c is strlen("Hi World!"). From a user perspective, it would be better to hide _c for the debugger because it is not something that the user had in the original program. I think that _c should be emitted in DWARF, as an artificial parameter, both for the sake of the debugger and for LTO. LTO is supposed to be language-independent, which means that the information it reads in needs to be sufficient to compute the types of things (as they will be at the level of GIMPLE) without language hooks. It may be that this idea turns out to be too idealistic, and that some language hooks are necessary to interpret the DWARF, but I would hope to avoid that. Similarly, LTO has to somehow deal with DECL_VALUE_EXPR and the debug information that is produced from it. Is there already some provision to handle this kind of trickery in LTO? No, not yet. but what would happen if LTO reads this in and re-constructs the type of "i" from this information? I imagine it would lead to mis-matches of the GIMPLE code that you read in, where "i" is a 1x100 array, and the re-constructed variable "i" which would be a 10x10 2D array. Has anyone working on LTO already thought of these challanges? Yes, I've thought about these things -- but that doesn't mean I have ready answers. I've been thinking first and foremost about C, and then about C and C++. Some of the same issues apply, but some don't. In C/C++, we don't linearize the array type. I don't know if that's viable in gfortran or not; is there a way to get the same performance in the middle end that you currently get by doing this in the front end? In the worst case, we will provide a separate type attribute in DWARF giving the "GIMPLE type" of the variable. Then, that type would be the linearized array. LTO would use the GIMPLE type attribute (if present) when reconstructing the type. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713