dwz-0.1 - DWARF compression tool

2012-04-18 Thread Jakub Jelinek
Hi!

I'd like to announce dwz-0.1, a DWARF compression tool I've spent this
April hacking on.  It is currently (see below) written as standalone tool,
with minimal dependencies (though time hasn't been spent on portability
yet, so assumes glibc host), in particular just a small amount of code
in it depends on libelf (tested with elfutils only), in C.
The tool parses DWARF2+ debug info, finds matching DIEs between different
CUs that might be moved to new CUs with DW_TAG_partial_unit, estimates
whether it is worthwhile, if it is, moves it there and adds
DW_TAG_imported_unit DIEs (a tree of them) to make sure every CU includes
all the needed DIEs.  DW_TAG_imported_unit DIEs created by this tool
will only be direct children of DW_TAG_{compile,partial}_unit DIEs, if
something from a named namespace/module can be shared, the DW_TAG_namespace
or DW_TAG_module DIE with the same DW_AT_name is added in the partial unit
as well.

In addition to the duplicate sharing the tool performs some other small
optimizations, it chooses the best DW_FORM_ref{1,2,4,_udata} form for
intra-CU references (the same in each CU, otherwise we might create
way too many abbreviations) to minimize the size of the CU, performs various
optimizations on .debug_abbrev to allow more CUs to share the same abbrev
table while not increasing CU size (abbrev numbers are uleb128 encoded,
so after going to 128 or more abbrevs the higher abbrev numbers will need
2 (or for really many abbrevs even more) bytes), etc.

The tool is available from
http://people.redhat.com/jakub/dwz/dwz-0.1.tar.bz2

For testing, I was using a set of -gdwarf-4 -fno-debug-types-section
built binaries/shared libraries and a matching rebuild thereof with
-gdwarf-4 -fdebug-types-section (note, while the tool supports even
DWARF 2 (and 3) input, it is highly recommended to be used on DWARF 3+
at least, especially on 64-bit architectures, because DW_FORM_ref_addr
is 8 bytes in DWARF 2 for 64-bit pointer size, rather than just 4 bytes.

Below are some numbers, I had a collection of 4 binaries/libraries
from GCC and 273 libraries/binaries (well, separate debug info for them)
from libreoffice.  First number are sizes for original
-gdwarf-4 -fno-debug-types-section objects, third number sizes for
original -gdwarf-4 -fdebug-types-section objects, fifth line for
-gdwarf-4 -fno-debug-types-section objects processed with the dwz
tool (2nd and 4th are relative sizes of third/fifth number
compared to first in percent), the last number is user time from time
command on i7-2600 host.  For each collection there is du -sk line with
file sizes of all files in the collection (in kilobytes), then 3sec sum
line which contains sum of .debug_{info,abbrev,types} section sizes in
bytes in all the objects together and then for each individual source
it lists sum of .debug_{info,abbrev,types} section sizes in bytes
in the particular object.  In each collection those lines are sorted
from best to worst percentual achievement of .debug_types savings.

For all files dwz sizes are smaller than corresponding sizes with
.debug_types (which is for several reasons, .debug_types has higher
reference overhead (8 bytes), moves only selected kinds of types,
and only a single DIE in each DW_TAG_type_unit can be referenced).
On 47% of the input files .debug_types actually results in size
degradation rather than improvement.  Of course on the other
side .debug_types doesn't need the extra optimization.

For the speed you can look at the table, two largest inputs took
in between 10 and 20 seconds (largest libsclo.so.debug with 16million
of DIEs above 18 seconds), 11 other inputs took in between 3 and 10
seconds, 14 other inputs took in between 1 and 3 seconds, the remaining
250 inputs took below one second.  As for memory requirements,
the largest (again libsclo.so.debug) needs on 64-bit host 2.2GB of RAM
(mainly in 16million+ struct dw_die structures (72 bytes, but obstack
used for that rounds it to 80), .5GB in a hash table for
offset -> internal DIE pointer representation lookups,
68MB (new content of .debug_info), 2MB (new content of .debug_abbrev),
e.g. on cc1plus which is also pretty large debug info it needs around
800MB.  On 32-bit hosts I'd expect something in between that and half of
that.

The tool is new, so it hasn't gone with any extensive testing yet,
I plan to hack up some tool that will try to verify no debug info has been
lost during the compression process, Tom Tromey is working on GDB side
of the supports for DW_TAG_partial_unit/DW_TAG_imported_unit, other tools
might need changes too if they don't support it (it is standard DWARF3+)
or if they don't support it efficiently.

I'm not sure whether the tool later on (for testing a standalone tool
is best) should be kept as separate, post-linking tool, or whether
we should try to integrate it into the linker (or keep as both
separate tool and part of linker (or linker plugin?)).
The current libelf dependencies could be probably easily split into
a separat

What do do with the exceptional case of expand_case for SJLJ exceptions

2012-04-18 Thread Steven Bosscher
Hello,

If I move GIMPLE_SWITCH lowering from stmt.c to somewhere in the
GIMPLE pass pipeline, I run into an issue with SJLJ exceptions. The
problem is that except.c:sjlj_emit_dispatch_table() builts a
GIMPLE_SWITCH and calls expand_case on it. If I move all non-casesi,
non-tablejump code out of stmt.c and make it a GIMPLE lowering pass
(currently I have the code in tree-switch-conversion.c) then two
things happen:

1. SJLJ exception dispatch tables can only be expanded as casesi or
tablejump. This may not be optimal.

2. If the target asks for SJLJ exceptions but it has no casesi and no
tablejump insns or expanders, then the compilation will fail.

I don't think (1) is a big problem, because exceptions should be,
well, exceptions after all so optimizing them shouldn't be terribly
important. For (2), I had hoped it would be a requirement to have
either casesi or tablejump, but that doesn't seem to be the case. But
I could put in some code to expand it as a series of test-and-branch
insns instead, in case there is only a small number of num_dispatches.

What is the reason why lowering for SJLJ exceptions is not done in GIMPLE?

Would it be a problem for anyone if SJLJ exception handling will be
less efficient, if I move GIMPLE_SWITCH lowering earlier in the pass
pipeline?

Ciao!
Steven


Re: About sink load from memory in tree-ssa-sink.c

2012-04-18 Thread Richard Guenther
On Wed, Apr 18, 2012 at 8:53 AM, Bin.Cheng  wrote:
> Hi,
> As discussed at thread
> "http://gcc.gnu.org/ml/gcc/2012-04/msg00396.html";, I am trying a patch
> now.
> The problem here is I have to go through all basic block from
> "sink_from" to "sink_to" to check whether
> the memory might be clobbered in them.
> Currently I have two methods:
> 1, do fully data analysis to compute the "can_sink" information at
> each basic block, which means whether
> we can sink a load to a basic block;
> 2, just compute the transitive closure of CFG, and check any basic
> block dominated by "sink_from" and can
> reach "sink_to" basic block;
>
> The 2nd method is an approximation, simpler than method 1 but misses
> some cases like:
>
> L1:
>  load x
> L2:
>  using x
> L3:
>  set x
>  goto L1
>
> In which, "load x" should be sunk to L2 if there is benefit.
>
> I measured the number of sunk loads during bootstrap gcc for x86,
> there are about 732 using method 1, while only 602 using method 2.
>
> So any comment on this topic? Thanks very much.

I don't understand method 2.  I'd do

 start at the single predecessor of the sink-to block

 foreach stmt from the end to the beginning of that block
   if the stmt has a VDEF or the same VUSE as the stmt we sink, break

 (continue searching for VDEFs in predecessors - that now gets more expensive,
  I suppose limiting sinking to the cases where the above finds sth
would be easiest,
  even limiting sinking to never sink across any stores)

 walk the vuse -> vdef chain, using refs_anti_dependent_p to see whether
 the load is clobbered.

But I'd suggest limiting the sinking to never sink across stores - the alias
memory model we have in GCC seriously limits these anyway.  How would
the numbers change if you do that?

Richard.



> --
> Best Regards.


Re: What do do with the exceptional case of expand_case for SJLJ exceptions

2012-04-18 Thread Richard Guenther
On Wed, Apr 18, 2012 at 10:35 AM, Steven Bosscher  wrote:
> Hello,
>
> If I move GIMPLE_SWITCH lowering from stmt.c to somewhere in the
> GIMPLE pass pipeline, I run into an issue with SJLJ exceptions. The
> problem is that except.c:sjlj_emit_dispatch_table() builts a
> GIMPLE_SWITCH and calls expand_case on it. If I move all non-casesi,
> non-tablejump code out of stmt.c and make it a GIMPLE lowering pass
> (currently I have the code in tree-switch-conversion.c) then two
> things happen:
>
> 1. SJLJ exception dispatch tables can only be expanded as casesi or
> tablejump. This may not be optimal.

AFAIK SJLJ dispatch tables are dense, the switch is for the exeptional
case (heh - the case where SJLJ exceptions are supposed to be fast ...),
and most of the case functions have a single EH receiver(?) we already
have an optimized case for.

> 2. If the target asks for SJLJ exceptions but it has no casesi and no
> tablejump insns or expanders, then the compilation will fail.
>
> I don't think (1) is a big problem, because exceptions should be,
> well, exceptions after all so optimizing them shouldn't be terribly
> important. For (2), I had hoped it would be a requirement to have
> either casesi or tablejump, but that doesn't seem to be the case. But
> I could put in some code to expand it as a series of test-and-branch
> insns instead, in case there is only a small number of num_dispatches.

Can't we always expand a "lowered" tablejump, aka computed goto?

> What is the reason why lowering for SJLJ exceptions is not done in GIMPLE?

Because it completely wrecks loops because we factor the SJLJ site,
thus

fn ()
{
...
  for (;;)
{
   try { X } catch { Y }
}

becomes

fn ()
{
   if (setjmp ())
 {
switch (...)
   ... goto L;
 }
   for (;;)
 {
X;
L:
  Y;
 }

thus loops with try/catch get another entry preventing it from being analyzed
(you see RTL loop optimizers doing nothing on such non-loops).

Of course that's similar to how we handle computed goto.

> Would it be a problem for anyone if SJLJ exception handling will be
> less efficient, if I move GIMPLE_SWITCH lowering earlier in the pass
> pipeline?

I suppose that's the real question.

Richard.

> Ciao!
> Steven


Re: What do do with the exceptional case of expand_case for SJLJ exceptions

2012-04-18 Thread Jan Hubicka
> > What is the reason why lowering for SJLJ exceptions is not done in GIMPLE?
> 
> Because it completely wrecks loops because we factor the SJLJ site,
> thus
> 
> fn ()
> {
> ...
>   for (;;)
> {
>try { X } catch { Y }
> }
> 
> becomes
> 
> fn ()
> {
>if (setjmp ())
>  {
> switch (...)
>... goto L;
>  }
>for (;;)
>  {
> X;
> L:
>   Y;
>  }

Well, if SJLJ lowering happens as gimple pass somewhere near the end of gimple
queue, this should not be problem at all. (and implementation would be cleaner)

Honza


Re: Why does lower-subreg mark copied pseudos as "decomposable"?

2012-04-18 Thread Richard Sandiford
Andrew Stubbs  writes:
> On 17/04/12 18:20, Richard Sandiford wrote:
>> Andrew Stubbs  writes:
>>> Hi all,
>>>
>>> I can see why copying from one pseudo-register to another would not be a
>>> reason *not* to decompose a register, but I don't understand why this is
>>> a reason to say it *should* be decomposed.
>>
>> The idea is that, if a backend implements an N-word pseudo move using
>> N word-mode moves, it is better to expose those moves before register
>> allocation.  It's easier for RA to find N separate word-mode registers
>> than a single contiguous N-word one.
>
> Ok, I think I understand that, but it seems slightly wrong to me.
>
> It makes sense to lower *real* moves, but before the fwprop pass there 
> are quite a lot of pseudos that only exist as artefacts of the expand 
> process. Moving the subreg1 pass after fwprop1 would probably do the 
> trick, but that would probably also defeat the object of lowering early.
>
> I've done a couple of experiments:
>
> First, I tried adding an extra fwprop pass before subreg1. I needed to 
> move up the dfinit pass also to make that work, but then it did work: it 
> successfully compiled my testcase without a regression.
>
> I'm not sure that adding an extra pass isn't overkill, so second I tried 

Yeah, sounds rather expensive :-)

> adjusting lower-subreg to avoid this problem; I modified 
> find_pseudo_copy so that it rejected copies that didn't change the mode, 
> on the principle that fwprop would probably have eliminated the move 
> anyway. This was successful also, and a much less expensive change.
>
> Does that make sense? The pseudos involved in the move will still get 
> lowered if the other conditions hold.

The problem is that not all register moves are always going to be
eliminated, even when no mode changes are involved.  It might make
sense to restrict that code you quoted:

case SIMPLE_PSEUDO_REG_MOVE:
  if (MODES_TIEABLE_P (GET_MODE (x), word_mode))
bitmap_set_bit (decomposable_context, regno);
  break;

to the second pass though.

>> The problem is the "if a backend implements ..." bit: the current code
>> doesn't check.  This patch:
>>
>>  http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00094.html
>>
>> should help.  It's still waiting for me to find a case where the two
>> possible ways of handling hot-cold partitioning behave differently.
>
> I've not studied that patch in detail, but I'm not sure it'll help. In 
> most cases, including my testcase, lowering is the correct thing to do 
> if NEON (or IWMMXT, perhaps) is not enabled.

Right.  I think I misunderstood, sorry.  I thought this regression was
for NEON only, but do you mean that adding these NEON patterns introduces
the regression for non-NEON targets as well?

> When NEON is enabled, however, it may still be the right thing to do:
> NEON does not provide a full set of DImode operations. The test for
> subreg-only uses ought to be enough to differentiate, once the
> extraneous pseudos such as the one in my testcase have been dealt
> with.

OK.  If/when that patches goes in, the ARM backend is going to have
to pick an rtx cost for DImode SETs.  It sounds like the cost will need
to be twice an SImode move regardless of whether or not NEON is enabled.

Richard


Debug info for comdat functions

2012-04-18 Thread Jakub Jelinek
Hi!

Something not addressed yet in dwz and unfortunately without
linker or compiler help not 100% addressable is debug info for
comdat functions.

Consider attached testcase with comdat foo function, seems the
current linker behavior (well, tested with 2.21.53.0.1 ld.bfd)
is that for DW_TAG_subprogram with DW_AT_low_pc/DW_AT_high_pc
having section relative relocs against comdat functions
if the comdat text section has the same size in both object
files, then DW_AT_low_pc (and DW_AT_high_pc) attributes
in both CUs will point to the same range.
E.g. when compiling g++ -gdwarf-4 -o t9 t91.C t92.C, both
.text._Z3fooi sections are indentical one byte.
I think if the section content is identical, then what the
linker does is fine and perhaps dwz could just do something
with it later on (currently it doesn't consider DIEs with
DW_AT_low_pc/DW_AT_high_pc/DW_AT_ranges attributes for dup
removal).  If both .text._Z3fooi sections have different
sizes, then the linker will clear DW_AT_low_pc/DW_AT_high_pc,
which is also fine (compile e.g. t91.C with -O2 and t92.C with -O0).
I guess most debug info consumers will ignore the 0..0 range
and dwz could be tought to do something about those DW_TAG_subprogram
nodes too (what exactly?  Drop DW_AT_{low_pc,high_pc,ranges} attribute
from them, drop all DW_TAG_inlined_subroutine/DW_TAG_lexical_block
children (perhaps all children?) of them, rewrite .debug_loc section
if some portion of it was only referenced by to be removed DIEs?).

The problematic case (I'd say a linker bug) is when the .text._Z3fooi
sections have the same size, but different content (compiled with
different options, but by lack of luck happened to have the same size).
Tested by hacking up t91.s and t92.s both built with -O2 to have different,
but same sized, instructions in .text._Z3fooi.  IMHO in that case
will debug info consumers see wrong debug info and dwz can't guess what
DIE describes the actual content and what DIE describes something
that has been removed.

For the libreoffice test files I have (and libstdc++.so) I've quickly
hacked up a guess how much could be saved by handling the comdats
in dwz - the numbers are the size of DW_TAG_subprogram DIE and all its
children if the same values of both DW_AT_low_pc/DW_AT_high_pc attributes
were already seen in another DIE.  Possible .debug_loc saving isn't
accounted for, on the other side cost of DW_TAG_imported_unit,
DW_TAG_partial_unit and/or keeping around a small portion of the
DW_TAG_subprogram die for 0..0 ranges isn't in either.

liblwpftlo.so.debug 1160625
libooxlo.so.debug 939155
libswlo.so.debug 819029
libooxmllo.so.debug 789318
libsclo.so.debug 740099
libchartmodello.so.debug 636127
libsdlo.so.debug 592827
libdbulo.so.debug 458561
libsvxcorelo.so.debug 455718
libchartcontrollerlo.so.debug 418735
libfrmlo.so.debug 410486
slideshow.uno.so.debug 392586
libdbalo.so.debug 374204
libfwklo.so.debug 359078
libxolo.so.debug 327187
libsfxlo.so.debug 294460
vbaobj.uno.so.debug 282619
libtklo.so.debug 239364
libacclo.so.debug 227900
libvcllo.so.debug 209380
libdrawinglayerlo.so.debug 202697
libbf_frmlo.so.debug 192942
libscfiltlo.so.debug 188769
libbf_xolo.so.debug 184482
libbf_svxlo.so.debug 178985
libdbtoolslo.so.debug 172211
vbaswobj.uno.so.debug 169971
libbf_swlo.so.debug 169310
libsvtlo.so.debug 165993
libmswordlo.so.debug 151021
libcharttoolslo.so.debug 148725
libdoctoklo.so.debug 148573
libcomphelpgcc3.so.debug 143429
cairocanvas.uno.so.debug 142835
libbf_sclo.so.debug 140327
libcuilo.so.debug 133518
libpcrlo.so.debug 132128
i18npool.uno.so.debug 125995
vclcanvas.uno.so.debug 117225
libdeployment.so.debug 109223
libchartviewlo.so.debug 101869
libsvxlo.so.debug 96798
librptuilo.so.debug 95625
postgresql-sdbc-impl.uno.so.debug 95311
libswuilo.so.debug 94986
msforms.uno.so.debug 88318
libutllo.so.debug 72778
libbf_svtlo.so.debug 67642
libunoxmllo.so.debug 65290
libfilterconfiglo.so.debug 64802
libsblo.so.debug 63344
librptlo.so.debug 60664
libvbahelperlo.so.debug 60292
libbf_schlo.so.debug 58526
configmgr.uno.so.debug 58439
libeditenglo.so.debug 57242
libfilelo.so.debug 56444
libfwllo.so.debug 53012
libpackage2.so.debug 50186
libxcrlo.so.debug 49733
libcppcanvaslo.so.debug 47957
libbasctllo.so.debug 40421
libbf_sdlo.so.debug 38870
libsvllo.so.debug 37970
libxsec_fw.so.debug 34438
libjdbclo.so.debug 33137
libdbaselo.so.debug 32789
libxmlsecurity.so.debug 32416
libhsqldb.so.debug 32403
libsmlo.so.debug 32339
libuuilo.so.debug 30973
liblnglo.so.debug 29532
libfwelo.so.debug 28311
libodbcbaselo.so.debug 27829
librptxmllo.so.debug 27530
libwpftlo.so.debug 26609
libscuilo.so.debug 26554
libucpchelp1.so.debug 25999
libdeploymentgui.so.debug 25982
libmysqllo.so.debug 25576
libfwilo.so.debug 25144
libembobj.so.debug 23555
libxstor.so.debug 23167
libsofficeapp.so.debug 23124
libmsfilterlo.so.debug 22551
libdbaxmllo.so.debug 22032
libucpfile1.so.debug 21467
libxsec_xmlsec.so.debug 19768
libevoablo.so.debug 19409
libspalo.so.debug 18660
libflatlo.so.debug 18345
libu

Re: Why does lower-subreg mark copied pseudos as "decomposable"?

2012-04-18 Thread Andrew Stubbs

On 18/04/12 11:55, Richard Sandiford wrote:

The problem is that not all register moves are always going to be
eliminated, even when no mode changes are involved.  It might make
sense to restrict that code you quoted:

case SIMPLE_PSEUDO_REG_MOVE:
  if (MODES_TIEABLE_P (GET_MODE (x), word_mode))
bitmap_set_bit (decomposable_context, regno);
  break;

to the second pass though.


Yes, I thought of that, but I dismissed it because the second pass is 
really very late. It would be just in time to take advantage of the 
relaxed register allocation, but would miss out on all the various 
optimizations that forward-propagation, combining, and such can offer.


This is why I've tried to find a way to do something about it in the 
first pass. I thought it makes sense to do something for none-no-op 
moves (when is there such a thing, btw, without it being and extend, 
truncate, or subreg?), but the no-op moves are trickier.


Perhaps a combination of the two ideas? Decompose mode-changing moves in 
the first pass, and all moves in the second?


BTW, the lower-subreg pass has a forward propagation concept of its own. 
If I read it right, even with the above changes, it will still decompose 
the move if the register it copies from has been decomposed, and the 
register it copies to is not marked 'non-decomposable'.


Hmm, I'm going to try to come up with some testcases that demonstrate 
the different cases and see if that helps me think about it. Do you 
happen to have any to hand?



I've not studied that patch in detail, but I'm not sure it'll help. In
most cases, including my testcase, lowering is the correct thing to do
if NEON (or IWMMXT, perhaps) is not enabled.


Right.  I think I misunderstood, sorry.  I thought this regression was
for NEON only, but do you mean that adding these NEON patterns introduces
the regression for non-NEON targets as well?


No, you were right, the regression only occurs when NEON is enabled. 
Otherwise the machine description behaves exactly as it used to.



When NEON is enabled, however, it may still be the right thing to do:
NEON does not provide a full set of DImode operations. The test for
subreg-only uses ought to be enough to differentiate, once the
extraneous pseudos such as the one in my testcase have been dealt
with.


OK.  If/when that patches goes in, the ARM backend is going to have
to pick an rtx cost for DImode SETs.  It sounds like the cost will need
to be twice an SImode move regardless of whether or not NEON is enabled.


That sounds reasonable. Of course, how much a register move costs is a 
tricky subject for NEON anyway. :(


Andrew



Re: Debug info for comdat functions

2012-04-18 Thread Jakub Jelinek
Hi!

Sorry for following up to self, but something I forgot to add
about this:

On Wed, Apr 18, 2012 at 01:16:40PM +0200, Jakub Jelinek wrote:
> Something not addressed yet in dwz and unfortunately without
> linker or compiler help not 100% addressable is debug info for
> comdat functions.

When discussed on IRC recently Jason preferred to move the DW_TAG_subprogram
describing a comdat function to a comdat .debug_info DW_TAG_partial_unit
and just reference all DIEs that need to be referenced from it
using DW_FORM_ref_addr back to the non-comdat .debug_info.  Perhaps put its
sole .debug_loc contributions into comdat part as well, .debug_ranges maybe
too.  I've thought about that approach a little bit, but I see issues with
that, at least with the current linker behavior.
In particular, even for identical .text.* section content different CUs
might have slightly different partial units.  The comdat .debug_info
sections couldn't be hashed in any way, it would use normal comdat
mechanism.  We would have DW_TAG_imported_unit with DW_AT_import
attribute pointing to the start DW_TAG_partial_unit in the section
(we would need to hardcode the +11 bytes offset, assuming nobody
ever emits 64-bit DWARF) and not refer to any other DIEs from the partial
unit.  If the comdat .debug_info section sizes are the same, it will
work fine (unless the IMHO ld bug mentioned in previous mail is fixed,
then it would work only if the section is bitwise identical).  But if they
are different, the linker will put there 0 for the relocation rather,
which doesn't refer to any DW_TAG_*_unit and is thus invalid DWARF.

Jakub


Re: dwz-0.1 - DWARF compression tool

2012-04-18 Thread Jakub Jelinek
On Wed, Apr 18, 2012 at 09:49:11AM +0200, Mike Dupont wrote:
> this is exciting, thanks for sharing.
> 
> I wonder what amount of data is even the same between many libraries,

Of course there is a lot of DWARF duplication in between different
libraries, or binaries, or e.g. Linux kernel modules (which have the
added problem that they have relocations against the sections; we could
apply and remove the relocations against .debug_* sections (and do string
merging of .debug_str at the same time) there as first step, but there would
be still relocations against the module .text/.data etc.).

The problem with that is that we'd need DWARF extensions to do the
duplication elimination in between different libraries/binaries.

I can think of two possible approaches:

1) indicate somehow that .debug_* sections live elsewhere, in a single
   (per package?) *.debug object, where all the .debug_* sections would be
   concatenated together and then just compress the debug info
   in that large object.  The main problem with that is that suddenly
   all places in the debug info that refer to .text/.data (and other
   allocated sections) addresses need to be augmented somehow to say
   which of the possibly many shared libraries or kernel modules or
   binaries they refer to.  That would be too hard.  It could be
   done just by some attribute in each DW_TAG_*_unit saying what that CU
   refers to (if it uses any addresses anywhere), and other .debug_*
   sections that are solely referenced from .debug_info would be fine too.
   But e.g. .debug_aranges would need extensions...

2) or, alternatively, keep most of the debug info in the individual
   objects (shared libraries, binaries, kernel modules) and just for
   what dwz currently moves over into new DW_TAG_partial_unit CUs (assuming
   it doesn't contain any .text/.data references and only refers to
   DIEs inside of them or in other partial units that don't contain
   any .text/.data references) move those partial units to a .debug_info
   section in a separate file (and add some new .debug_* section that
   would hint the debug info consumers how to find the separate file
   (build-id, or filename, or combination of both, whatever).
   If we support just one such separate file, we could just have
   DW_FORM_alt_sec_offset and DW_FORM_ref_alt_addr new forms, which
   would mean this is the corresponding .debug{_line,_loc,_loc}
   section offset, but not inside of this file, but in the secondary
   file.  If we were to support more than one, we'd need to number them
   and add forms that would say start with uleb128 number index of
   the separate file followed by actual offset.  Still, a shorthand
   form for the first one separate file might be handy, assuming that
   is what is done most of the time.
   With many possibly large binaries/libraries together there are major
   concerns about memory consumption though, so I think the tool would
   need to do it in steps - compress each file individually first
   (what the tool does right now) and for eligible partial units append
   them to a common separate file (and keep them in the original file
   too).  When the first pass over all files is done, merge duplicates
   within the common separate file which holds just the partial units.
   Second pass would then take the reduced common separate file and
   the compressed debug info from the first pass, and find duplicate
   partial units, switch references to them in their forms to the
   alt forms and remove the no longer needed partial units.
   Of course the separate common file would not need to contain
   just .debug_info and .debug_abbrev sections, but also some minimal
   .debug_line section (not containing actual line instructions, but
   dir/file tables).

My preference would be 2).  What do you think?

Jakub


Re: dwz-0.1 - DWARF compression tool

2012-04-18 Thread Mark Wielaard
On Wed, Apr 18, 2012 at 02:26:45PM +0200, Jakub Jelinek wrote:
> Of course there is a lot of DWARF duplication in between different
> libraries, or binaries, or e.g. Linux kernel modules (which have the
> added problem that they have relocations against the sections; we could
> apply and remove the relocations against .debug_* sections (and do string
> merging of .debug_str at the same time) there as first step, but there would
> be still relocations against the module .text/.data etc.).

BTW. We do now remove the relocations against .debug_* sections in Fedora
(using rpm >= 4.9 and elfutils >= 0.153) and that saves a lot of space:
http://lists.fedoraproject.org/pipermail/kernel/2012-February/003665.html
"This saves ~500MB on the installed size of the kernel-debuginfo
package and makes the rpm file ~30MB smaller"

Cheers,

Mark


Re: Debug info for comdat functions

2012-04-18 Thread Jason Merrill

On 04/18/2012 07:53 AM, Jakub Jelinek wrote:

Consider attached testcase with comdat foo function, seems the
current linker behavior (well, tested with 2.21.53.0.1 ld.bfd)
is that for DW_TAG_subprogram with DW_AT_low_pc/DW_AT_high_pc
having section relative relocs against comdat functions
if the comdat text section has the same size in both object
files, then DW_AT_low_pc (and DW_AT_high_pc) attributes
in both CUs will point to the same range.


This seems clearly wrong to me.  A reference to a symbol in a discarded 
section should not resolve to an offset into a different section.  I 
thought the linker always resolved such references to 0, and I think 
that is what we want.



When discussed on IRC recently Jason preferred to move the DW_TAG_subprogram
describing a comdat function to a comdat .debug_info DW_TAG_partial_unit
and just reference all DIEs that need to be referenced from it
using DW_FORM_ref_addr back to the non-comdat .debug_info.


I played around with implementing this in the compiler yesterday; my 
initial patch is attached.  It seems that with normal DWARF 4 this can 
work well, but I ran into issues with various GNU extensions:


DW_TAG_GNU_call_site wants to refer to the called function's DIE, so the 
function die in the separate unit needs to have its own symbol.  Perhaps 
_call_site could refer to the function symbol instead?  That seems more 
correct anyway, since with COMDAT functions you might end up calling a 
different version of the function that has a different DIE.


The typed stack ops such as DW_OP_GNU_deref_type want to refer to a type 
in the same CU, so we would need to copy any referenced base types into 
the separate function CU.  Could we add variants of these ops that take 
an offset from .debug_info?



Perhaps put its
sole .debug_loc contributions into comdat part as well, .debug_ranges maybe
too.


I haven't done anything with .debug_loc yet.

.debug_ranges mostly goes away with this change; the main CU becomes 
just .text and the separate CUs are just their own function.  I suppose 
.debug_ranges would still be needed with hot/cold optimizations.



We would have DW_TAG_imported_unit with DW_AT_import
attribute pointing to the start DW_TAG_partial_unit in the section
(we would need to hardcode the +11 bytes offset, assuming nobody
ever emits 64-bit DWARF) and not refer to any other DIEs from the partial
unit.


I think it would be both better and more correct to have the 
DW_AT_imported_unit going the other way, so the function CU imports the 
main CU.  That's what DWARF4 appendix E suggests.  My patch doesn't 
implement this yet.


Jason
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index abe3f1b..c113c63 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -8612,6 +8612,7 @@ ix86_code_end (void)
    NULL_TREE, void_type_node);
   TREE_PUBLIC (decl) = 1;
   TREE_STATIC (decl) = 1;
+  DECL_IGNORED_P (decl) = 1;
 
 #if TARGET_MACHO
   if (TARGET_MACHO)
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 7e2ce58..0c33af2 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -1007,6 +1007,7 @@ dwarf2out_begin_prologue (unsigned int line ATTRIBUTE_UNUSED,
   fde->dw_fde_current_label = dup_label;
   fde->in_std_section = (fnsec == text_section
 			 || (cold_text_section && fnsec == cold_text_section));
+  fde->comdat = DECL_ONE_ONLY (current_function_decl);
 
   /* We only want to output line number information for the genuine dwarf2
  prologue case, not the eh frame case.  */
@@ -3291,8 +3292,10 @@ static void compute_section_prefix (dw_die_ref);
 static int is_type_die (dw_die_ref);
 static int is_comdat_die (dw_die_ref);
 static int is_symbol_die (dw_die_ref);
+static int is_abstract_die (dw_die_ref);
 static void assign_symbol_names (dw_die_ref);
 static void break_out_includes (dw_die_ref);
+static void break_out_comdat_functions (dw_die_ref);
 static int is_declaration_die (dw_die_ref);
 static int should_move_die_to_comdat (dw_die_ref);
 static dw_die_ref clone_as_declaration (dw_die_ref);
@@ -4105,6 +4108,9 @@ dwarf_attr_name (unsigned int attr)
 case DW_AT_GNU_macros:
   return "DW_AT_GNU_macros";
 
+case DW_AT_GNU_comdat:
+  return "DW_AT_GNU_comdat";
+
 case DW_AT_GNAT_descriptive_type:
   return "DW_AT_GNAT_descriptive_type";
 
@@ -6698,6 +6704,9 @@ is_symbol_die (dw_die_ref c)
 {
   return (is_type_die (c)
 	  || is_declaration_die (c)
+	  || is_abstract_die (c)
+	  /* DW_TAG_GNU_call_site can refer to subprograms.  */
+	  || c->die_tag == DW_TAG_subprogram
 	  || c->die_tag == DW_TAG_namespace
 	  || c->die_tag == DW_TAG_module);
 }
@@ -6728,6 +6737,8 @@ assign_symbol_names (dw_die_ref die)
 
   if (is_symbol_die (die))
 {
+  if (die->die_id.die_symbol)
+	return;
   if (comdat_symbol_id)
 	{
 	  char *p = XALLOCAVEC (char, strlen (comdat_symbol_id) + 64);
@@ -6900,6 +6911,65 @@ break_out_includes (dw_die_ref die)
   htab_delete (cu_hash_table);
 }
 
+static c

Re: What do do with the exceptional case of expand_case for SJLJ exceptions

2012-04-18 Thread Richard Henderson
On 04/18/2012 05:39 AM, Jan Hubicka wrote:
> Well, if SJLJ lowering happens as gimple pass somewhere near the end of gimple
> queue, this should not be problem at all. (and implementation would be 
> cleaner)

If you can find a clean way of separating sjlj expansion from dw2 expansion,
please do.  But there's a lot of code shared between the two.  I see nothing
wrong with always expanding via tablejump.


r~


Re: Debug info for comdat functions

2012-04-18 Thread Jakub Jelinek
On Wed, Apr 18, 2012 at 08:43:37AM -0400, Jason Merrill wrote:
> On 04/18/2012 07:53 AM, Jakub Jelinek wrote:
> >Consider attached testcase with comdat foo function, seems the
> >current linker behavior (well, tested with 2.21.53.0.1 ld.bfd)
> >is that for DW_TAG_subprogram with DW_AT_low_pc/DW_AT_high_pc
> >having section relative relocs against comdat functions
> >if the comdat text section has the same size in both object
> >files, then DW_AT_low_pc (and DW_AT_high_pc) attributes
> >in both CUs will point to the same range.
> 
> This seems clearly wrong to me.  A reference to a symbol in a
> discarded section should not resolve to an offset into a different
> section.  I thought the linker always resolved such references to 0,
> and I think that is what we want.

If the .text (and all other allocated sections) in the comdat group
is bitwise identical, I think it isn't a problem to refer to that,
it really doesn't matter at that point which object file won owning it.
But if it is different, I really think it is a bug not to clear it.

> >When discussed on IRC recently Jason preferred to move the DW_TAG_subprogram
> >describing a comdat function to a comdat .debug_info DW_TAG_partial_unit
> >and just reference all DIEs that need to be referenced from it
> >using DW_FORM_ref_addr back to the non-comdat .debug_info.
> 
> I played around with implementing this in the compiler yesterday; my
> initial patch is attached.  It seems that with normal DWARF 4 this
> can work well, but I ran into issues with various GNU extensions:

Importing the non-comdat .debug_info from comdat .debug_info is an
interesting approach.  My slight problem with that is that the debug info
no longer describes the input source file 1:1, eventhough the
DW_TAG_imported_unit brings in stuff like local variables (so that when
settping through that comdat e.g. static variables in the same source file
will be visible), other comdat functions in that file won't.  But perhaps
that is not a big issue, guess it is up to the debug info consumer folks
to chime in about that.

> DW_TAG_GNU_call_site wants to refer to the called function's DIE, so
> the function die in the separate unit needs to have its own symbol.
> Perhaps _call_site could refer to the function symbol instead?  That
> seems more correct anyway, since with COMDAT functions you might end
> up calling a different version of the function that has a different
> DIE.

At this point it is too late to change the specification of the extension.
But you could just put in a DW_TAG_subprogram DW_AT_external declaration
in the main .debug_info and just refer to that from call_site as well
as from DW_AT_specification in the comdat .debug_info.  That
DW_AT_abstract_origin is meant there to be just one of the possible many
DIEs referring to the callee, the debug info consumer is supposed to find
out the actual DIE that contains the code from it using its usual
mechanisms.
Or, for call references to the comdat functions you can drop
DW_AT_abstract_origin attribute and instead provide
DW_AT_call_site_target with DW_OP_addr .
The latter has the disadvantage that the linker will clear it from time to
time (if its .text.* section size is different).

> The typed stack ops such as DW_OP_GNU_deref_type want to refer to a
> type in the same CU, so we would need to copy any referenced base
> types into the separate function CU.  Could we add variants of these
> ops that take an offset from .debug_info?

The DW_TAG_base_type is small enough that we can duplicate it, in the
dup we actually could drop DW_AT_name (i.e. keep it as is in the main
.debug_info and in the comdat just provide DW_AT_encoding/DW_AT_byte_size).
That is 3 bytes for the base type, small enough that it offsets for the
smaller uleb128 sizes.

Jakub


Re: Why does lower-subreg mark copied pseudos as "decomposable"?

2012-04-18 Thread Richard Sandiford
Andrew Stubbs  writes:
> On 18/04/12 11:55, Richard Sandiford wrote:
>> The problem is that not all register moves are always going to be
>> eliminated, even when no mode changes are involved.  It might make
>> sense to restrict that code you quoted:
>>
>>  case SIMPLE_PSEUDO_REG_MOVE:
>>if (MODES_TIEABLE_P (GET_MODE (x), word_mode))
>>  bitmap_set_bit (decomposable_context, regno);
>>break;
>>
>> to the second pass though.
>
> Yes, I thought of that, but I dismissed it because the second pass is 
> really very late. It would be just in time to take advantage of the 
> relaxed register allocation, but would miss out on all the various 
> optimizations that forward-propagation, combining, and such can offer.
>
> This is why I've tried to find a way to do something about it in the 
> first pass. I thought it makes sense to do something for none-no-op 
> moves (when is there such a thing, btw, without it being and extend, 
> truncate, or subreg?),

AFAIK there isn't, which is why I'm a bit unsure what you're suggesting.

Different modes like DI and DF can both be stored in NEON registers,
so if you have a situation where one is punned into the other,
I think that's an even stronger reason to want to keep them together.

> but the no-op moves are trickier.
>
> Perhaps a combination of the two ideas? Decompose mode-changing moves in 
> the first pass, and all moves in the second?
>
> BTW, the lower-subreg pass has a forward propagation concept of its own. 
> If I read it right, even with the above changes, it will still decompose 
> the move if the register it copies from has been decomposed, and the 
> register it copies to is not marked 'non-decomposable'.

Right.

> Hmm, I'm going to try to come up with some testcases that demonstrate 
> the different cases and see if that helps me think about it. Do you 
> happen to have any to hand?

'Fraid not, sorry.

>> OK.  If/when that patches goes in, the ARM backend is going to have
>> to pick an rtx cost for DImode SETs.  It sounds like the cost will need
>> to be twice an SImode move regardless of whether or not NEON is enabled.
>
> That sounds reasonable. Of course, how much a register move costs is a 
> tricky subject for NEON anyway. :(

Yeah.

Richard


Re: Why does lower-subreg mark copied pseudos as "decomposable"?

2012-04-18 Thread Andrew Stubbs

On 18/04/12 16:53, Richard Sandiford wrote:

Andrew Stubbs  writes:

On 18/04/12 11:55, Richard Sandiford wrote:

The problem is that not all register moves are always going to be
eliminated, even when no mode changes are involved.  It might make
sense to restrict that code you quoted:

case SIMPLE_PSEUDO_REG_MOVE:
  if (MODES_TIEABLE_P (GET_MODE (x), word_mode))
bitmap_set_bit (decomposable_context, regno);
  break;

to the second pass though.


Yes, I thought of that, but I dismissed it because the second pass is
really very late. It would be just in time to take advantage of the
relaxed register allocation, but would miss out on all the various
optimizations that forward-propagation, combining, and such can offer.

This is why I've tried to find a way to do something about it in the
first pass. I thought it makes sense to do something for none-no-op
moves (when is there such a thing, btw, without it being and extend,
truncate, or subreg?),


AFAIK there isn't, which is why I'm a bit unsure what you're suggesting.


And why I don't understand what the current code is trying to achieve.


Different modes like DI and DF can both be stored in NEON registers,
so if you have a situation where one is punned into the other,
I think that's an even stronger reason to want to keep them together.


Does the compiler use pseudo-reg copies for that? I thought it mostly 
just referred to the same register with a different mode and everything 
just DTRT.


OK, let's go back to the start: at first sight, the lower-subregs pass 
decomposes every psuedo-register that is larger than a core register, is 
only defined or used via subreg or a simple copy, or is a copy of a 
decomposed register that has no non-decomposable features itself 
(forward propagation). It does not deliberately decompose 
pseudo-registers that are only copies from or to a hard-register, even 
though there's nothing intrinsically non-decomposable about that 
(besides that there's no benefit), but it can happen if forward 
propagation occurs. It explicitly does not decompose any pseudo that is 
used in a non-move DImode operation.


All this makes sense to me: if the backend is written such that DImode 
operations are expanded in terms of SImode subregs, then it's better to 
think of the subregs independently. (On ARM, this *is* the case when 
NEON is disabled.)


But then there's this extra "feature" that a pseudo-to-pseudo copy 
triggers both pseudo registers to be considered decomposable (unless 
there's some other use that prohibits it), and I don't know why?


Yes, I understand that a move from NEON to core might benefit from this, 
but those don't exist before reload. I also theorized that moves that 
convert to some other kind of mode might be interesting (the existing 
code checks for "tieable" modes, presumable with reason), but I can't 
come up with a valid example (mode changes usually require a non-move 
operation of some kind).


In fact, the only examples of a pseudo-pseudo copy that won't be 
eliminated by fwprop et al would be to do with loops and conditionals, 
and I don't understand why they should be special.


The result of this extra feature is that if I copy the output of a 
DImode insn *directly* to a DImode hard reg (say a return value) then 
there's no decomposition, but if the expand pass happens to have put an 
intermediate pseudo register (as it does do) then this extra rule 
decomposes it most unhelpfully (ok, there's only actually a problem if 
the compiler can reason that one subreg or the other is unchanged, as is 
the case with sign_extend).


So, after having thought all this through again, unless somebody can 
show why not, I propose that we remove this mis-feature entirely, or at 
least disable it in the first pass.


Andrew


Re: Why does lower-subreg mark copied pseudos as "decomposable"?

2012-04-18 Thread Richard Sandiford
Andrew Stubbs  writes:
> On 18/04/12 16:53, Richard Sandiford wrote:
>> Andrew Stubbs  writes:
>>> On 18/04/12 11:55, Richard Sandiford wrote:
 The problem is that not all register moves are always going to be
 eliminated, even when no mode changes are involved.  It might make
 sense to restrict that code you quoted:

case SIMPLE_PSEUDO_REG_MOVE:
  if (MODES_TIEABLE_P (GET_MODE (x), word_mode))
bitmap_set_bit (decomposable_context, regno);
  break;

 to the second pass though.
>>>
>>> Yes, I thought of that, but I dismissed it because the second pass is
>>> really very late. It would be just in time to take advantage of the
>>> relaxed register allocation, but would miss out on all the various
>>> optimizations that forward-propagation, combining, and such can offer.
>>>
>>> This is why I've tried to find a way to do something about it in the
>>> first pass. I thought it makes sense to do something for none-no-op
>>> moves (when is there such a thing, btw, without it being and extend,
>>> truncate, or subreg?),
>>
>> AFAIK there isn't, which is why I'm a bit unsure what you're suggesting.
>
> And why I don't understand what the current code is trying to achieve.

See below.

>> Different modes like DI and DF can both be stored in NEON registers,
>> so if you have a situation where one is punned into the other,
>> I think that's an even stronger reason to want to keep them together.
>
> Does the compiler use pseudo-reg copies for that? I thought it mostly 
> just referred to the same register with a different mode and everything 
> just DTRT.
>
> OK, let's go back to the start: at first sight, the lower-subregs pass 
> decomposes every psuedo-register that is larger than a core register, is 
> only defined or used via subreg or a simple copy, or is a copy of a 
> decomposed register that has no non-decomposable features itself 
> (forward propagation). It does not deliberately decompose 
> pseudo-registers that are only copies from or to a hard-register, even 
> though there's nothing intrinsically non-decomposable about that 
> (besides that there's no benefit), but it can happen if forward 
> propagation occurs. It explicitly does not decompose any pseudo that is 
> used in a non-move DImode operation.
>
> All this makes sense to me: if the backend is written such that DImode 
> operations are expanded in terms of SImode subregs, then it's better to 
> think of the subregs independently. (On ARM, this *is* the case when 
> NEON is disabled.)
>
> But then there's this extra "feature" that a pseudo-to-pseudo copy 
> triggers both pseudo registers to be considered decomposable (unless 
> there's some other use that prohibits it), and I don't know why?
>
> Yes, I understand that a move from NEON to core might benefit from this, 
> but those don't exist before reload. I also theorized that moves that 
> convert to some other kind of mode might be interesting (the existing 
> code checks for "tieable" modes, presumable with reason), but I can't 
> come up with a valid example (mode changes usually require a non-move 
> operation of some kind).
>
> In fact, the only examples of a pseudo-pseudo copy that won't be 
> eliminated by fwprop et al would be to do with loops and conditionals, 
> and I don't understand why they should be special.

Not just those, because loads, stores, calls, volatiles, etc.,
can't be moved freely.  E.g. code like:

uint64_t foo (uint64_t *x, uint64_t z)
{
  uint64_t y = *x;
  *x = z;
  return y;
}

benefits too, because y must be a pseudo.

I don't think the idea is that these cases are special in themselves.
What we're looking for are pseudos that _may_ be decomposed into
separate registers.  If one of the pseudos in the move is only used in
decomposable contexts (including nonvolatile loads and stores, as well
as copies to and from hard registers, etc.), then we may be able to
completely replace the original pseudo with two smaller ones.  E.g.:

(set (reg:DI X) (mem:DI ...))
...
(set (reg:DI Y) (reg:DI X))

In this case, X can be completely replaced by two SImode registers.

What isn't clear to me is why we don't seem to do the same for:

(set (reg:DI X) (mem:DI ...))
(set (mem:DI ...) (reg:DI X))

Perhaps we do and I'm just misreading the code.  Or perhaps it's just
too hard to get the costs right.  Splitting that would be moving even
further from what you want though :-)

> The result of this extra feature is that if I copy the output of a 
> DImode insn *directly* to a DImode hard reg (say a return value) then 
> there's no decomposition, but if the expand pass happens to have put an 
> intermediate pseudo register (as it does do) then this extra rule 
> decomposes it most unhelpfully (ok, there's only actually a problem if 
> the compiler can reason that one subreg or the other is unchanged, as is 
> the case with sign_extend).

But remember that this pass is not

GIT Mirror Down?

2012-04-18 Thread Iyer, Balaji V
Hello Everyone,
Is the GIT mirror for GCC down? I tried clicking on the snapshot link 
near a commit and it is timing out.


Thanks,

Balaji V. Iyer.


Re: Debug info for comdat functions

2012-04-18 Thread Cary Coutant
> This seems clearly wrong to me.  A reference to a symbol in a discarded
> section should not resolve to an offset into a different section.  I thought
> the linker always resolved such references to 0, and I think that is what we
> want.

Even resolving to 0 can cause problems. In the Gnu linker, all
references to a discarded symbol get relocated to 0, ignoring any
addend. This can result in spurious (0,0) pairs in range lists. In
Gold, we treat the discarded symbol as 0, but still apply the addend,
and count on GDB to recognize that the function starting at 0 must
have been discarded. Neither solution is ideal. That's why debug info
for COMDAT functions ought to be in the same COMDAT group as the
function...

>> When discussed on IRC recently Jason preferred to move the
>> DW_TAG_subprogram
>> describing a comdat function to a comdat .debug_info DW_TAG_partial_unit
>> and just reference all DIEs that need to be referenced from it
>> using DW_FORM_ref_addr back to the non-comdat .debug_info.
>
> I played around with implementing this in the compiler yesterday; my initial
> patch is attached.  It seems that with normal DWARF 4 this can work well,
> but I ran into issues with various GNU extensions:

Nice -- I've been wanting to do that for a while, but I always thought
it would be a lot harder. I see that you've based this on the
infrastructure created for -feliminate-dwarf2-dups. I don't think that
will play nice with -fdebug-types-section, though, since I basically
made those two options incompatible with each other by unioning
die_symbol with die_type_node.

In the HP-UX compilers, we basically put a complete set of .debug_*
sections in each COMDAT group, and treated the group as a compilation
unit of its own (not a partial unit). That worked well, and avoided
some of the problems you're running into (although clearly is more
wasteful in terms of object file size). Readelf and friends will need
to be taught how to find the right auxiliary debug sections, though --
they currently have a built-in assumption that there's only one of
each.

-cary


Re: GIT Mirror Down?

2012-04-18 Thread Frank Ch. Eigler

"Iyer, Balaji V"  writes:

> Is the GIT mirror for GCC down? I tried clicking on the snapshot
> link near a commit and it is timing out.

It could be that generating the snapshot is taking more CPU time
than the web server is configured to permit.  Consider making
your own git clone, and generate snapshot tarballs from that.

- FChE


Re: Debug info for comdat functions

2012-04-18 Thread Jason Merrill

On 04/18/2012 07:40 PM, Cary Coutant wrote:

Nice -- I've been wanting to do that for a while, but I always thought
it would be a lot harder. I see that you've based this on the
infrastructure created for -feliminate-dwarf2-dups. I don't think that
will play nice with -fdebug-types-section, though, since I basically
made those two options incompatible with each other by unioning
die_symbol with die_type_node.


I think it should be OK because I wait until after the debug_types 
processing is done, at which point limbo_die_list is empty.  Or am I 
just not seeing the problem?



In the HP-UX compilers, we basically put a complete set of .debug_*
sections in each COMDAT group, and treated the group as a compilation
unit of its own (not a partial unit).


So you copy anything that the function refers to into the CU for the 
function?



wasteful in terms of object file size). Readelf and friends will need
to be taught how to find the right auxiliary debug sections, though --
they currently have a built-in assumption that there's only one of
each.


Good to know.

Jason


Re: Debug info for comdat functions

2012-04-18 Thread Jakub Jelinek
On Wed, Apr 18, 2012 at 03:23:35PM +0200, Jakub Jelinek wrote:
> > DW_TAG_GNU_call_site wants to refer to the called function's DIE, so
> > the function die in the separate unit needs to have its own symbol.
> > Perhaps _call_site could refer to the function symbol instead?  That
> > seems more correct anyway, since with COMDAT functions you might end
> > up calling a different version of the function that has a different
> > DIE.
> 
> At this point it is too late to change the specification of the extension.
> But you could just put in a DW_TAG_subprogram DW_AT_external declaration
> in the main .debug_info and just refer to that from call_site as well
> as from DW_AT_specification in the comdat .debug_info.  That
> DW_AT_abstract_origin is meant there to be just one of the possible many
> DIEs referring to the callee, the debug info consumer is supposed to find
> out the actual DIE that contains the code from it using its usual
> mechanisms.

That could be easily done by keeping around the original die_node of the
DW_TAG_subprogram for comdat in the main CU, create a new die_node in the
comdat unit and move all (or all but formal_parameter?) children to it
and copy/move attributes.  Thus all references to the subprogram would go
to the main .debug_info.

Jakub