Memory usage of 4.2 versus 4.3 (at branchpoints)

2006-10-21 Thread Jan Hubicka
Hi,
to give some perspective to the discussion on memory usage, I generated
comparsion of 4.2 branchpoint to 4.3 branchpoint from logs of our memory
tester.  I would say it is quite pleasing to see that 4.3 is not really
regression relative 4.2 in most tests like it was custom in previous
releases, but still we ought to do a lot better ;)

There is posssibly interesting 35% regression at -O1 combine.c...

Honza

comparing combine.c compilation at -O0 level:
  Peak amount of GGC memory allocated before garbage collecting run decreased 
from 9595k to 8929k, overall -7.46%
  Peak amount of GGC memory still allocated after garbage collecting decreased 
from 8942k to 8558k, overall -4.49%
  Amount of produced GGC garbage decreased from 40099k to 34878k, overall 
-14.97%
  Amount of memory still referenced at the end of compilation decreased from 
6705k to 6073k, overall -10.41%
Overall memory needed: 24905k -> 24797k
Peak memory use before GGC: 9595k -> 8929k
Peak memory use after GGC: 8942k -> 8558k
Maximum of released memory in single GGC run: 2737k -> 2576k
Garbage: 40099k -> 34878k
Leak: 6705k -> 6073k
Overhead: 5788k -> 4715k
GGC runs: 317 -> 294

comparing combine.c compilation at -O1 level:
  Overall memory allocated via mmap and sbrk increased from 26820k to 36237k, 
overall 35.11%
  Amount of produced GGC garbage decreased from 60618k to 55748k, overall -8.74%
  Amount of memory still referenced at the end of compilation decreased from 
6888k to 6151k, overall -11.98%
Overall memory needed: 26820k -> 36237k
Peak memory use before GGC: 17364k -> 16999k
Peak memory use after GGC: 17180k -> 16830k
Maximum of released memory in single GGC run: 2372k -> 2342k
Garbage: 60618k -> 55748k
Leak: 6888k -> 6151k
Overhead: 7578k -> 6045k
GGC runs: 387 -> 369

comparing combine.c compilation at -O2 level:
  Amount of memory still referenced at the end of compilation decreased from 
6973k to 6252k, overall -11.53%
Overall memory needed: 26820k -> 26496k
Peak memory use before GGC: 17367k -> 16999k
Peak memory use after GGC: 17180k -> 16830k
Maximum of released memory in single GGC run: 2452k -> 2884k
Garbage: 77388k -> 76521k
Leak: 6973k -> 6252k
Overhead: 10022k -> 8785k
GGC runs: 456 -> 443

comparing combine.c compilation at -O3 level:
  Ovarall memory allocated via mmap and sbrk decreased from 26820k to 25596k, 
overall -4.78%
  Amount of memory still referenced at the end of compilation decreased from 
7030k to 6317k, overall -11.28%
Overall memory needed: 26820k -> 25596k
Peak memory use before GGC: 18365k -> 17988k
Peak memory use after GGC: 17995k -> 17536k
Maximum of released memory in single GGC run: 3510k -> 4130k
Garbage: 107793k -> 107354k
Leak: 7030k -> 6317k
Overhead: 13563k -> 12408k
GGC runs: 509 -> 490

comparing insn-attrtab.c compilation at -O0 level:
  Overall memory allocated via mmap and sbrk increased from 80924k to 83700k, 
overall 3.43%
  Amount of produced GGC garbage decreased from 146623k to 125964k, overall 
-16.40%
  Amount of memory still referenced at the end of compilation decreased from 
9856k to 9117k, overall -8.11%
Overall memory needed: 80924k -> 83700k
Peak memory use before GGC: 69469k -> 68247k
Peak memory use after GGC: 45007k -> 43913k
Maximum of released memory in single GGC run: 36247k -> 35708k
Garbage: 146623k -> 125964k
Leak: 9856k -> 9117k
Overhead: 19791k -> 16830k
GGC runs: 252 -> 231

comparing insn-attrtab.c compilation at -O1 level:
  Overall memory allocated via mmap and sbrk increased from 111696k to 118444k, 
overall 6.04%
  Peak amount of GGC memory allocated before garbage collecting increased from 
94037k to 94551k, overall 0.55%
  Peak amount of GGC memory still allocated after garbage collectin increased 
from 83553k to 90403k, overall 8.20%
  Amount of memory still referenced at the end of compilation decreased from 
10072k to 8977k, overall -12.20%
Overall memory needed: 111696k -> 118444k
Peak memory use before GGC: 94037k -> 94551k
Peak memory use after GGC: 83553k -> 90403k
Maximum of released memory in single GGC run: 32589k -> 31807k
Garbage: 289765k -> 289427k
Leak: 10072k -> 8977k
Overhead: 36663k -> 29408k
GGC runs: 245 -> 240

comparing insn-attrtab.c compilation at -O2 level:
  Ovarall memory allocated via mmap and sbrk decreased from 127120k to 114404k, 
overall -11.11%
  Peak amount of GGC memory allocated before garbage collecting run decreased 
from 113347k to 95237k, overall -19.02%
  Peak amount of GGC memory still allocated after garbage collectin increased 
from 83466k to 90625k, overall 8.58%
  Amount of produced GGC garbage decreased from 372181k to 328157k, overall 
-13.42%
  Amount of memory still referenced at the end of compilation decreased from 
10176k to 8982k, overall -13.30%
Overall memory needed: 127120k -> 114404k
Peak memory 

Question about LTO dwarf reader vs. artificial variables and formal arguments

2006-10-21 Thread Steven Bosscher
Hello,

I want to make gfortran produce better debug information, but I want to do it 
in a way that doesn't make it hard/impossible to read back in sufficient 
information for LTO to work for gfortran.  

I haven't really been following the whole LTO thing much, but if I understand 
correctly, the goal is to reconstruct information about declarations from 
DWARF information that we write out for those declarations.  If that's the 
case, I wonder how LTO will handle artificial "variables" and formal argument 
lists. 

For example, gfortran adds additional formal arguments for functions that take 
a CHARACTER string as a formal argument, e.g.

program test
implicit none
call sub("Hi World!")

contains
   subroutine sub(c)
   character*10 c
   end subroutine

end

produces as a GIMPLE dump:

MAIN__ ()
{
  static void sub (char[1:10] &, int4);

  _gfortran_set_std (70, 127, 0);
  sub ("Hi World!", 9);
}


sub (c, _c)
{
  (void) 0;
}

where _c is strlen("Hi World!").  From a user perspective, it would be better 
to hide _c for the debugger because it is not something that the user had in 
the original program.  I have a patch to hide that parameter, that is, it 
stops GCC from writing out DW_TAG_formal_parameter for _c.  But I am worried 
about how this will work out later if/when someone tries to make LTO work for 
gfortran too.
Can you still reconstruct the correct function prototype for LTO from the 
debug info if you don't write debug info for _c?

Similarly, LTO has to somehow deal with DECL_VALUE_EXPR and the debug 
information that is produced from it.  Gfortran (and iiuc other front ends 
and SRA) use this DECL_VALUE_EXPR to produce fake variables that point to 
some location to improve the debug experience of the user.  For Fortran we 
use it to create fake variables to point at members of a COMMON block, for 
example, so that the user can do "p A" for a variable A in a common block, 
instead of "p name_of_the_common_block.A".  Is there already some provision 
to handle this kind of trickery in LTO?

Finally, consider another Fortran example:

program debug_array_dimensions
implicit none
integer i(10,10)
i(2,9) = 1
end

Gfortran currently produces the following wrong debug information for this 
example:

 <2><94>: Abbrev Number: 3 (DW_TAG_variable)
 DW_AT_name: i
 DW_AT_decl_file   : 1
 DW_AT_decl_line   : 1
 DW_AT_type: 
 DW_AT_location: 3 byte block: 91 e0 7c (DW_OP_fbreg: -416)
 <1>: Abbrev Number: 4 (DW_TAG_array_type)
 DW_AT_type: 
 DW_AT_sibling : 
 <2>: Abbrev Number: 5 (DW_TAG_subrange_type)
 DW_AT_type: 
 DW_AT_lower_bound : 0
 DW_AT_upper_bound : 99
 <1>: Abbrev Number: 6 (DW_TAG_base_type)
 DW_AT_byte_size   : 8
 DW_AT_encoding: 5  (signed)
 DW_AT_name: int8
 <1>: Abbrev Number: 6 (DW_TAG_base_type)
 DW_AT_byte_size   : 4
 DW_AT_encoding: 5  (signed)
 DW_AT_name: int4

Note the sinlge DW_TAG_subrange_type <0, 99> for the type of "i", instead of 
two times DW_TAG_subrange_type <1, 10> instead.  This happens because in 
gfortran all arrays are flattened (iirc to make code generation easier).  I 
would like to make gfortran write out the correct debug information, e.g. 
something with

 <2>: Abbrev Number: 5 (DW_TAG_subrange_type)
 DW_AT_type: 
 DW_AT_upper_bound : 10
 <2>: Abbrev Number: 5 (DW_TAG_subrange_type)
 DW_AT_type: 
 DW_AT_upper_bound : 10

but what would happen if LTO reads this in and re-constructs the type of "i" 
from this information?  I imagine it would lead to mis-matches of the GIMPLE 
code that you read in, where "i" is a 1x100 array, and the re-constructed 
variable "i" which would be a 10x10 2D array.

Has anyone working on LTO already thought of these challanges?

I'm all new to both DWARF and LTO, so forgive me if my rant doesn't make 
sense ;-)  

Gr.
Steven



Re: Question about LTO dwarf reader vs. artificial variables and formal arguments

2006-10-21 Thread Daniel Jacobowitz
On Sat, Oct 21, 2006 at 06:35:40PM +0200, Steven Bosscher wrote:
> where _c is strlen("Hi World!").  From a user perspective, it would be better 
> to hide _c for the debugger because it is not something that the user had in 
> the original program.  I have a patch to hide that parameter, that is, it 
> stops GCC from writing out DW_TAG_formal_parameter for _c.  But I am worried 
> about how this will work out later if/when someone tries to make LTO work for 
> gfortran too.
> Can you still reconstruct the correct function prototype for LTO from the 
> debug info if you don't write debug info for _c?

Wouldn't this be upsetting to debuggers, too - for instance, if they
wanted to call the function?  It might be wiser to tag it
DW_AT_artificial, and let the debugger sort out what to do with it.

> Similarly, LTO has to somehow deal with DECL_VALUE_EXPR and the debug 
> information that is produced from it.  Gfortran (and iiuc other front ends 
> and SRA) use this DECL_VALUE_EXPR to produce fake variables that point to 
> some location to improve the debug experience of the user.  For Fortran we 
> use it to create fake variables to point at members of a COMMON block, for 
> example, so that the user can do "p A" for a variable A in a common block, 
> instead of "p name_of_the_common_block.A".  Is there already some provision 
> to handle this kind of trickery in LTO?

I don't think we're far enough on yet to know the answer to this or
your other question, but I may be wrong.  There's a reason we're
focusing on C right now :-)  I don't think the design precludes this
sort of thing, but we won't know how it all fits together until more's
been done.

-- 
Daniel Jacobowitz
CodeSourcery


Re: Question about LTO dwarf reader vs. artificial variables and formal arguments

2006-10-21 Thread Ian Lance Taylor
Steven Bosscher <[EMAIL PROTECTED]> writes:

> I haven't really been following the whole LTO thing much, but if I understand 
> correctly, the goal is to reconstruct information about declarations from 
> DWARF information that we write out for those declarations.  If that's the 
> case, I wonder how LTO will handle artificial "variables" and formal argument 
> lists. 

I think it is a mistake to focus on DWARF details too much.  We simply
need some mechanism to write trees into an object file and to read
them back in.  That mechanism can be anything.

We are using DWARF on the theory that it will be simpler because DWARF
readers and writers already exist (I don't buy that argument myself,
but, whatever).  But it is clearly impossible to represent everything
we need to represent in DWARF.  So we need to extend DWARF as
necessary to represent all the tree details.

That is, we are not going to write out DWARF.  We can't, because DWARF
is not designed to represent all the details which the compiler needs
to represent.  What we are going to write out is a superset of DWARF.
And in fact, if it helps, I think that we shouldn't hesitate to write
out something which is similar to but incompatible with DWARF.

In general reading and writing trees is far from the hardest part of
the LTO effort.  I think it is a mistake for us to get too tied up in
the details of how to represent things in DWARF.  (I also think that
we could probably do better by defining our own bytecode language, one
optimized for our purposes, but it's not an issue worth fighting
over.)

Ian


Re: Question about LTO dwarf reader vs. artificial variables and formal arguments

2006-10-21 Thread Diego Novillo

Ian Lance Taylor wrote on 10/21/06 14:59:


That is, we are not going to write out DWARF.  We can't, because DWARF
is not designed to represent all the details which the compiler needs
to represent.  What we are going to write out is a superset of DWARF.
And in fact, if it helps, I think that we shouldn't hesitate to write
out something which is similar to but incompatible with DWARF.

In general reading and writing trees is far from the hardest part of
the LTO effort.  I think it is a mistake for us to get too tied up in
the details of how to represent things in DWARF.  (I also think that
we could probably do better by defining our own bytecode language, one
optimized for our purposes, but it's not an issue worth fighting
over.)

Agreed.  I don't think we'll get far if we focus too much on DWARF, as 
it clearly cannot be used as a bytecode language for our purposes.


We will need to evolve our own bytecode language, either as an extension 
to DWARF (much like we did with SIMPLE) or do something from scratch. 
Implementing type support starting from DWARF is a start, but we should 
not constrain ourselves to it.


fwhole-program, -combine, several C sources ?

2006-10-21 Thread Basile STARYNKEVITCH
Dear All,

For C source programs only, is there a scenario where several (eg two or
more) C source files (ie passed *.c arguments) can be passed with the
-fwhole-program flag and without the -combine flag?

In other words, some medium sized (eg < 100 KLOC of C source in several *.c
files) programs can be compiled with
   gcc -O3 -fwhole-program -combine *.c -o prog ...libraries and includes
but the compilation fails if the -combine flag is not given.

Maybe the driver might issue some warning if -fwhole-program is given
without -combine for several C source inputs?

Or probably, I am not very well understanding the -fwhole-program flag

By the way, I am surprised that so few programs are compiled with it...
http://www.google.com/codesearch?q=fwhole-program give few results

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/ 
email: basilestarynkevitchnet 
aliases: basiletunesorg = bstarynknerimnet
8, rue de la Faïencerie, 92340 Bourg La Reine, France


Re: fwhole-program, -combine, several C sources ?

2006-10-21 Thread Andrew Pinski
On Sat, 2006-10-21 at 22:03 +0200, Basile STARYNKEVITCH wrote:
> Dear All,
> 
> For C source programs only, is there a scenario where several (eg two or
> more) C source files (ie passed *.c arguments) can be passed with the
> -fwhole-program flag and without the -combine flag?

You can have a whole program in one source module so the warning would
get in the way of that case.  Plus -combine is useless in that case
also.  In fact -combine will go away once LTO finishes.

Thanks,
Andrew Pinski



Re: Question about LTO dwarf reader vs. artificial variables and formal arguments

2006-10-21 Thread Mark Mitchell

Diego Novillo wrote:

Ian Lance Taylor wrote on 10/21/06 14:59:


That is, we are not going to write out DWARF.  We can't, because DWARF
is not designed to represent all the details which the compiler needs
to represent.  What we are going to write out is a superset of DWARF.
And in fact, if it helps, I think that we shouldn't hesitate to write
out something which is similar to but incompatible with DWARF.

In general reading and writing trees is far from the hardest part of
the LTO effort.  I think it is a mistake for us to get too tied up in
the details of how to represent things in DWARF.  (I also think that
we could probably do better by defining our own bytecode language, one
optimized for our purposes, but it's not an issue worth fighting
over.)

Agreed.  I don't think we'll get far if we focus too much on DWARF, as 
it clearly cannot be used as a bytecode language for our purposes.


I think the bytecode issue is a red herring, because we are no longer 
talking about using DWARF for the bodies of functions.  DWARF is only 
being used for declarations and types.


There, yes, we will need some extensions to represent things.  However, 
DWARF is designed to be extended, so that's no problem.  I continue to 
think think that using DWARF (with extensions) since it makes this 
information accessible to other tools (including GDB).  I think that 
before there ought to be a compelling reason to abandon a strategy based 
on DWARF.


--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: Question about LTO dwarf reader vs. artificial variables and formal arguments

2006-10-21 Thread Mark Mitchell

Steven Bosscher wrote:


contains
   subroutine sub(c)
   character*10 c
   end subroutine

end

produces as a GIMPLE dump:




sub (c, _c)
{
  (void) 0;
}

where _c is strlen("Hi World!").  From a user perspective, it would be better 
to hide _c for the debugger because it is not something that the user had in 
the original program. 


I think that _c should be emitted in DWARF, as an artificial parameter, 
both for the sake of the debugger and for LTO.  LTO is supposed to be 
language-independent, which means that the information it reads in needs 
to be sufficient to compute the types of things (as they will be at the 
level of GIMPLE) without language hooks.  It may be that this idea turns 
out to be too idealistic, and that some language hooks are necessary to 
interpret the DWARF, but I would hope to avoid that.


Similarly, LTO has to somehow deal with DECL_VALUE_EXPR and the debug 
information that is produced from it.  Is there already some provision 
to handle this kind of trickery in LTO?


No, not yet.

but what would happen if LTO reads this in and re-constructs the type of "i" 
from this information?  I imagine it would lead to mis-matches of the GIMPLE 
code that you read in, where "i" is a 1x100 array, and the re-constructed 
variable "i" which would be a 10x10 2D array.


Has anyone working on LTO already thought of these challanges?


Yes, I've thought about these things -- but that doesn't mean I have 
ready answers.  I've been thinking first and foremost about C, and then 
about C and C++.


Some of the same issues apply, but some don't.  In C/C++, we don't 
linearize the array type.  I don't know if that's viable in gfortran or 
not; is there a way to get the same performance in the middle end that 
you currently get by doing this in the front end?


In the worst case, we will provide a separate type attribute in DWARF 
giving the "GIMPLE type" of the variable.  Then, that type would be the 
linearized array.  LTO would use the GIMPLE type attribute (if present) 
when reconstructing the type.


--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713