dwz-0.1 - DWARF compression tool
Hi! I'd like to announce dwz-0.1, a DWARF compression tool I've spent this April hacking on. It is currently (see below) written as standalone tool, with minimal dependencies (though time hasn't been spent on portability yet, so assumes glibc host), in particular just a small amount of code in it depends on libelf (tested with elfutils only), in C. The tool parses DWARF2+ debug info, finds matching DIEs between different CUs that might be moved to new CUs with DW_TAG_partial_unit, estimates whether it is worthwhile, if it is, moves it there and adds DW_TAG_imported_unit DIEs (a tree of them) to make sure every CU includes all the needed DIEs. DW_TAG_imported_unit DIEs created by this tool will only be direct children of DW_TAG_{compile,partial}_unit DIEs, if something from a named namespace/module can be shared, the DW_TAG_namespace or DW_TAG_module DIE with the same DW_AT_name is added in the partial unit as well. In addition to the duplicate sharing the tool performs some other small optimizations, it chooses the best DW_FORM_ref{1,2,4,_udata} form for intra-CU references (the same in each CU, otherwise we might create way too many abbreviations) to minimize the size of the CU, performs various optimizations on .debug_abbrev to allow more CUs to share the same abbrev table while not increasing CU size (abbrev numbers are uleb128 encoded, so after going to 128 or more abbrevs the higher abbrev numbers will need 2 (or for really many abbrevs even more) bytes), etc. The tool is available from http://people.redhat.com/jakub/dwz/dwz-0.1.tar.bz2 For testing, I was using a set of -gdwarf-4 -fno-debug-types-section built binaries/shared libraries and a matching rebuild thereof with -gdwarf-4 -fdebug-types-section (note, while the tool supports even DWARF 2 (and 3) input, it is highly recommended to be used on DWARF 3+ at least, especially on 64-bit architectures, because DW_FORM_ref_addr is 8 bytes in DWARF 2 for 64-bit pointer size, rather than just 4 bytes. Below are some numbers, I had a collection of 4 binaries/libraries from GCC and 273 libraries/binaries (well, separate debug info for them) from libreoffice. First number are sizes for original -gdwarf-4 -fno-debug-types-section objects, third number sizes for original -gdwarf-4 -fdebug-types-section objects, fifth line for -gdwarf-4 -fno-debug-types-section objects processed with the dwz tool (2nd and 4th are relative sizes of third/fifth number compared to first in percent), the last number is user time from time command on i7-2600 host. For each collection there is du -sk line with file sizes of all files in the collection (in kilobytes), then 3sec sum line which contains sum of .debug_{info,abbrev,types} section sizes in bytes in all the objects together and then for each individual source it lists sum of .debug_{info,abbrev,types} section sizes in bytes in the particular object. In each collection those lines are sorted from best to worst percentual achievement of .debug_types savings. For all files dwz sizes are smaller than corresponding sizes with .debug_types (which is for several reasons, .debug_types has higher reference overhead (8 bytes), moves only selected kinds of types, and only a single DIE in each DW_TAG_type_unit can be referenced). On 47% of the input files .debug_types actually results in size degradation rather than improvement. Of course on the other side .debug_types doesn't need the extra optimization. For the speed you can look at the table, two largest inputs took in between 10 and 20 seconds (largest libsclo.so.debug with 16million of DIEs above 18 seconds), 11 other inputs took in between 3 and 10 seconds, 14 other inputs took in between 1 and 3 seconds, the remaining 250 inputs took below one second. As for memory requirements, the largest (again libsclo.so.debug) needs on 64-bit host 2.2GB of RAM (mainly in 16million+ struct dw_die structures (72 bytes, but obstack used for that rounds it to 80), .5GB in a hash table for offset -> internal DIE pointer representation lookups, 68MB (new content of .debug_info), 2MB (new content of .debug_abbrev), e.g. on cc1plus which is also pretty large debug info it needs around 800MB. On 32-bit hosts I'd expect something in between that and half of that. The tool is new, so it hasn't gone with any extensive testing yet, I plan to hack up some tool that will try to verify no debug info has been lost during the compression process, Tom Tromey is working on GDB side of the supports for DW_TAG_partial_unit/DW_TAG_imported_unit, other tools might need changes too if they don't support it (it is standard DWARF3+) or if they don't support it efficiently. I'm not sure whether the tool later on (for testing a standalone tool is best) should be kept as separate, post-linking tool, or whether we should try to integrate it into the linker (or keep as both separate tool and part of linker (or linker plugin?)). The current libelf dependencies could be probably easily split into a separat
What do do with the exceptional case of expand_case for SJLJ exceptions
Hello, If I move GIMPLE_SWITCH lowering from stmt.c to somewhere in the GIMPLE pass pipeline, I run into an issue with SJLJ exceptions. The problem is that except.c:sjlj_emit_dispatch_table() builts a GIMPLE_SWITCH and calls expand_case on it. If I move all non-casesi, non-tablejump code out of stmt.c and make it a GIMPLE lowering pass (currently I have the code in tree-switch-conversion.c) then two things happen: 1. SJLJ exception dispatch tables can only be expanded as casesi or tablejump. This may not be optimal. 2. If the target asks for SJLJ exceptions but it has no casesi and no tablejump insns or expanders, then the compilation will fail. I don't think (1) is a big problem, because exceptions should be, well, exceptions after all so optimizing them shouldn't be terribly important. For (2), I had hoped it would be a requirement to have either casesi or tablejump, but that doesn't seem to be the case. But I could put in some code to expand it as a series of test-and-branch insns instead, in case there is only a small number of num_dispatches. What is the reason why lowering for SJLJ exceptions is not done in GIMPLE? Would it be a problem for anyone if SJLJ exception handling will be less efficient, if I move GIMPLE_SWITCH lowering earlier in the pass pipeline? Ciao! Steven
Re: About sink load from memory in tree-ssa-sink.c
On Wed, Apr 18, 2012 at 8:53 AM, Bin.Cheng wrote: > Hi, > As discussed at thread > "http://gcc.gnu.org/ml/gcc/2012-04/msg00396.html";, I am trying a patch > now. > The problem here is I have to go through all basic block from > "sink_from" to "sink_to" to check whether > the memory might be clobbered in them. > Currently I have two methods: > 1, do fully data analysis to compute the "can_sink" information at > each basic block, which means whether > we can sink a load to a basic block; > 2, just compute the transitive closure of CFG, and check any basic > block dominated by "sink_from" and can > reach "sink_to" basic block; > > The 2nd method is an approximation, simpler than method 1 but misses > some cases like: > > L1: > load x > L2: > using x > L3: > set x > goto L1 > > In which, "load x" should be sunk to L2 if there is benefit. > > I measured the number of sunk loads during bootstrap gcc for x86, > there are about 732 using method 1, while only 602 using method 2. > > So any comment on this topic? Thanks very much. I don't understand method 2. I'd do start at the single predecessor of the sink-to block foreach stmt from the end to the beginning of that block if the stmt has a VDEF or the same VUSE as the stmt we sink, break (continue searching for VDEFs in predecessors - that now gets more expensive, I suppose limiting sinking to the cases where the above finds sth would be easiest, even limiting sinking to never sink across any stores) walk the vuse -> vdef chain, using refs_anti_dependent_p to see whether the load is clobbered. But I'd suggest limiting the sinking to never sink across stores - the alias memory model we have in GCC seriously limits these anyway. How would the numbers change if you do that? Richard. > -- > Best Regards.
Re: What do do with the exceptional case of expand_case for SJLJ exceptions
On Wed, Apr 18, 2012 at 10:35 AM, Steven Bosscher wrote: > Hello, > > If I move GIMPLE_SWITCH lowering from stmt.c to somewhere in the > GIMPLE pass pipeline, I run into an issue with SJLJ exceptions. The > problem is that except.c:sjlj_emit_dispatch_table() builts a > GIMPLE_SWITCH and calls expand_case on it. If I move all non-casesi, > non-tablejump code out of stmt.c and make it a GIMPLE lowering pass > (currently I have the code in tree-switch-conversion.c) then two > things happen: > > 1. SJLJ exception dispatch tables can only be expanded as casesi or > tablejump. This may not be optimal. AFAIK SJLJ dispatch tables are dense, the switch is for the exeptional case (heh - the case where SJLJ exceptions are supposed to be fast ...), and most of the case functions have a single EH receiver(?) we already have an optimized case for. > 2. If the target asks for SJLJ exceptions but it has no casesi and no > tablejump insns or expanders, then the compilation will fail. > > I don't think (1) is a big problem, because exceptions should be, > well, exceptions after all so optimizing them shouldn't be terribly > important. For (2), I had hoped it would be a requirement to have > either casesi or tablejump, but that doesn't seem to be the case. But > I could put in some code to expand it as a series of test-and-branch > insns instead, in case there is only a small number of num_dispatches. Can't we always expand a "lowered" tablejump, aka computed goto? > What is the reason why lowering for SJLJ exceptions is not done in GIMPLE? Because it completely wrecks loops because we factor the SJLJ site, thus fn () { ... for (;;) { try { X } catch { Y } } becomes fn () { if (setjmp ()) { switch (...) ... goto L; } for (;;) { X; L: Y; } thus loops with try/catch get another entry preventing it from being analyzed (you see RTL loop optimizers doing nothing on such non-loops). Of course that's similar to how we handle computed goto. > Would it be a problem for anyone if SJLJ exception handling will be > less efficient, if I move GIMPLE_SWITCH lowering earlier in the pass > pipeline? I suppose that's the real question. Richard. > Ciao! > Steven
Re: What do do with the exceptional case of expand_case for SJLJ exceptions
> > What is the reason why lowering for SJLJ exceptions is not done in GIMPLE? > > Because it completely wrecks loops because we factor the SJLJ site, > thus > > fn () > { > ... > for (;;) > { >try { X } catch { Y } > } > > becomes > > fn () > { >if (setjmp ()) > { > switch (...) >... goto L; > } >for (;;) > { > X; > L: > Y; > } Well, if SJLJ lowering happens as gimple pass somewhere near the end of gimple queue, this should not be problem at all. (and implementation would be cleaner) Honza
Re: Why does lower-subreg mark copied pseudos as "decomposable"?
Andrew Stubbs writes: > On 17/04/12 18:20, Richard Sandiford wrote: >> Andrew Stubbs writes: >>> Hi all, >>> >>> I can see why copying from one pseudo-register to another would not be a >>> reason *not* to decompose a register, but I don't understand why this is >>> a reason to say it *should* be decomposed. >> >> The idea is that, if a backend implements an N-word pseudo move using >> N word-mode moves, it is better to expose those moves before register >> allocation. It's easier for RA to find N separate word-mode registers >> than a single contiguous N-word one. > > Ok, I think I understand that, but it seems slightly wrong to me. > > It makes sense to lower *real* moves, but before the fwprop pass there > are quite a lot of pseudos that only exist as artefacts of the expand > process. Moving the subreg1 pass after fwprop1 would probably do the > trick, but that would probably also defeat the object of lowering early. > > I've done a couple of experiments: > > First, I tried adding an extra fwprop pass before subreg1. I needed to > move up the dfinit pass also to make that work, but then it did work: it > successfully compiled my testcase without a regression. > > I'm not sure that adding an extra pass isn't overkill, so second I tried Yeah, sounds rather expensive :-) > adjusting lower-subreg to avoid this problem; I modified > find_pseudo_copy so that it rejected copies that didn't change the mode, > on the principle that fwprop would probably have eliminated the move > anyway. This was successful also, and a much less expensive change. > > Does that make sense? The pseudos involved in the move will still get > lowered if the other conditions hold. The problem is that not all register moves are always going to be eliminated, even when no mode changes are involved. It might make sense to restrict that code you quoted: case SIMPLE_PSEUDO_REG_MOVE: if (MODES_TIEABLE_P (GET_MODE (x), word_mode)) bitmap_set_bit (decomposable_context, regno); break; to the second pass though. >> The problem is the "if a backend implements ..." bit: the current code >> doesn't check. This patch: >> >> http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00094.html >> >> should help. It's still waiting for me to find a case where the two >> possible ways of handling hot-cold partitioning behave differently. > > I've not studied that patch in detail, but I'm not sure it'll help. In > most cases, including my testcase, lowering is the correct thing to do > if NEON (or IWMMXT, perhaps) is not enabled. Right. I think I misunderstood, sorry. I thought this regression was for NEON only, but do you mean that adding these NEON patterns introduces the regression for non-NEON targets as well? > When NEON is enabled, however, it may still be the right thing to do: > NEON does not provide a full set of DImode operations. The test for > subreg-only uses ought to be enough to differentiate, once the > extraneous pseudos such as the one in my testcase have been dealt > with. OK. If/when that patches goes in, the ARM backend is going to have to pick an rtx cost for DImode SETs. It sounds like the cost will need to be twice an SImode move regardless of whether or not NEON is enabled. Richard
Debug info for comdat functions
Hi! Something not addressed yet in dwz and unfortunately without linker or compiler help not 100% addressable is debug info for comdat functions. Consider attached testcase with comdat foo function, seems the current linker behavior (well, tested with 2.21.53.0.1 ld.bfd) is that for DW_TAG_subprogram with DW_AT_low_pc/DW_AT_high_pc having section relative relocs against comdat functions if the comdat text section has the same size in both object files, then DW_AT_low_pc (and DW_AT_high_pc) attributes in both CUs will point to the same range. E.g. when compiling g++ -gdwarf-4 -o t9 t91.C t92.C, both .text._Z3fooi sections are indentical one byte. I think if the section content is identical, then what the linker does is fine and perhaps dwz could just do something with it later on (currently it doesn't consider DIEs with DW_AT_low_pc/DW_AT_high_pc/DW_AT_ranges attributes for dup removal). If both .text._Z3fooi sections have different sizes, then the linker will clear DW_AT_low_pc/DW_AT_high_pc, which is also fine (compile e.g. t91.C with -O2 and t92.C with -O0). I guess most debug info consumers will ignore the 0..0 range and dwz could be tought to do something about those DW_TAG_subprogram nodes too (what exactly? Drop DW_AT_{low_pc,high_pc,ranges} attribute from them, drop all DW_TAG_inlined_subroutine/DW_TAG_lexical_block children (perhaps all children?) of them, rewrite .debug_loc section if some portion of it was only referenced by to be removed DIEs?). The problematic case (I'd say a linker bug) is when the .text._Z3fooi sections have the same size, but different content (compiled with different options, but by lack of luck happened to have the same size). Tested by hacking up t91.s and t92.s both built with -O2 to have different, but same sized, instructions in .text._Z3fooi. IMHO in that case will debug info consumers see wrong debug info and dwz can't guess what DIE describes the actual content and what DIE describes something that has been removed. For the libreoffice test files I have (and libstdc++.so) I've quickly hacked up a guess how much could be saved by handling the comdats in dwz - the numbers are the size of DW_TAG_subprogram DIE and all its children if the same values of both DW_AT_low_pc/DW_AT_high_pc attributes were already seen in another DIE. Possible .debug_loc saving isn't accounted for, on the other side cost of DW_TAG_imported_unit, DW_TAG_partial_unit and/or keeping around a small portion of the DW_TAG_subprogram die for 0..0 ranges isn't in either. liblwpftlo.so.debug 1160625 libooxlo.so.debug 939155 libswlo.so.debug 819029 libooxmllo.so.debug 789318 libsclo.so.debug 740099 libchartmodello.so.debug 636127 libsdlo.so.debug 592827 libdbulo.so.debug 458561 libsvxcorelo.so.debug 455718 libchartcontrollerlo.so.debug 418735 libfrmlo.so.debug 410486 slideshow.uno.so.debug 392586 libdbalo.so.debug 374204 libfwklo.so.debug 359078 libxolo.so.debug 327187 libsfxlo.so.debug 294460 vbaobj.uno.so.debug 282619 libtklo.so.debug 239364 libacclo.so.debug 227900 libvcllo.so.debug 209380 libdrawinglayerlo.so.debug 202697 libbf_frmlo.so.debug 192942 libscfiltlo.so.debug 188769 libbf_xolo.so.debug 184482 libbf_svxlo.so.debug 178985 libdbtoolslo.so.debug 172211 vbaswobj.uno.so.debug 169971 libbf_swlo.so.debug 169310 libsvtlo.so.debug 165993 libmswordlo.so.debug 151021 libcharttoolslo.so.debug 148725 libdoctoklo.so.debug 148573 libcomphelpgcc3.so.debug 143429 cairocanvas.uno.so.debug 142835 libbf_sclo.so.debug 140327 libcuilo.so.debug 133518 libpcrlo.so.debug 132128 i18npool.uno.so.debug 125995 vclcanvas.uno.so.debug 117225 libdeployment.so.debug 109223 libchartviewlo.so.debug 101869 libsvxlo.so.debug 96798 librptuilo.so.debug 95625 postgresql-sdbc-impl.uno.so.debug 95311 libswuilo.so.debug 94986 msforms.uno.so.debug 88318 libutllo.so.debug 72778 libbf_svtlo.so.debug 67642 libunoxmllo.so.debug 65290 libfilterconfiglo.so.debug 64802 libsblo.so.debug 63344 librptlo.so.debug 60664 libvbahelperlo.so.debug 60292 libbf_schlo.so.debug 58526 configmgr.uno.so.debug 58439 libeditenglo.so.debug 57242 libfilelo.so.debug 56444 libfwllo.so.debug 53012 libpackage2.so.debug 50186 libxcrlo.so.debug 49733 libcppcanvaslo.so.debug 47957 libbasctllo.so.debug 40421 libbf_sdlo.so.debug 38870 libsvllo.so.debug 37970 libxsec_fw.so.debug 34438 libjdbclo.so.debug 33137 libdbaselo.so.debug 32789 libxmlsecurity.so.debug 32416 libhsqldb.so.debug 32403 libsmlo.so.debug 32339 libuuilo.so.debug 30973 liblnglo.so.debug 29532 libfwelo.so.debug 28311 libodbcbaselo.so.debug 27829 librptxmllo.so.debug 27530 libwpftlo.so.debug 26609 libscuilo.so.debug 26554 libucpchelp1.so.debug 25999 libdeploymentgui.so.debug 25982 libmysqllo.so.debug 25576 libfwilo.so.debug 25144 libembobj.so.debug 23555 libxstor.so.debug 23167 libsofficeapp.so.debug 23124 libmsfilterlo.so.debug 22551 libdbaxmllo.so.debug 22032 libucpfile1.so.debug 21467 libxsec_xmlsec.so.debug 19768 libevoablo.so.debug 19409 libspalo.so.debug 18660 libflatlo.so.debug 18345 libu
Re: Why does lower-subreg mark copied pseudos as "decomposable"?
On 18/04/12 11:55, Richard Sandiford wrote: The problem is that not all register moves are always going to be eliminated, even when no mode changes are involved. It might make sense to restrict that code you quoted: case SIMPLE_PSEUDO_REG_MOVE: if (MODES_TIEABLE_P (GET_MODE (x), word_mode)) bitmap_set_bit (decomposable_context, regno); break; to the second pass though. Yes, I thought of that, but I dismissed it because the second pass is really very late. It would be just in time to take advantage of the relaxed register allocation, but would miss out on all the various optimizations that forward-propagation, combining, and such can offer. This is why I've tried to find a way to do something about it in the first pass. I thought it makes sense to do something for none-no-op moves (when is there such a thing, btw, without it being and extend, truncate, or subreg?), but the no-op moves are trickier. Perhaps a combination of the two ideas? Decompose mode-changing moves in the first pass, and all moves in the second? BTW, the lower-subreg pass has a forward propagation concept of its own. If I read it right, even with the above changes, it will still decompose the move if the register it copies from has been decomposed, and the register it copies to is not marked 'non-decomposable'. Hmm, I'm going to try to come up with some testcases that demonstrate the different cases and see if that helps me think about it. Do you happen to have any to hand? I've not studied that patch in detail, but I'm not sure it'll help. In most cases, including my testcase, lowering is the correct thing to do if NEON (or IWMMXT, perhaps) is not enabled. Right. I think I misunderstood, sorry. I thought this regression was for NEON only, but do you mean that adding these NEON patterns introduces the regression for non-NEON targets as well? No, you were right, the regression only occurs when NEON is enabled. Otherwise the machine description behaves exactly as it used to. When NEON is enabled, however, it may still be the right thing to do: NEON does not provide a full set of DImode operations. The test for subreg-only uses ought to be enough to differentiate, once the extraneous pseudos such as the one in my testcase have been dealt with. OK. If/when that patches goes in, the ARM backend is going to have to pick an rtx cost for DImode SETs. It sounds like the cost will need to be twice an SImode move regardless of whether or not NEON is enabled. That sounds reasonable. Of course, how much a register move costs is a tricky subject for NEON anyway. :( Andrew
Re: Debug info for comdat functions
Hi! Sorry for following up to self, but something I forgot to add about this: On Wed, Apr 18, 2012 at 01:16:40PM +0200, Jakub Jelinek wrote: > Something not addressed yet in dwz and unfortunately without > linker or compiler help not 100% addressable is debug info for > comdat functions. When discussed on IRC recently Jason preferred to move the DW_TAG_subprogram describing a comdat function to a comdat .debug_info DW_TAG_partial_unit and just reference all DIEs that need to be referenced from it using DW_FORM_ref_addr back to the non-comdat .debug_info. Perhaps put its sole .debug_loc contributions into comdat part as well, .debug_ranges maybe too. I've thought about that approach a little bit, but I see issues with that, at least with the current linker behavior. In particular, even for identical .text.* section content different CUs might have slightly different partial units. The comdat .debug_info sections couldn't be hashed in any way, it would use normal comdat mechanism. We would have DW_TAG_imported_unit with DW_AT_import attribute pointing to the start DW_TAG_partial_unit in the section (we would need to hardcode the +11 bytes offset, assuming nobody ever emits 64-bit DWARF) and not refer to any other DIEs from the partial unit. If the comdat .debug_info section sizes are the same, it will work fine (unless the IMHO ld bug mentioned in previous mail is fixed, then it would work only if the section is bitwise identical). But if they are different, the linker will put there 0 for the relocation rather, which doesn't refer to any DW_TAG_*_unit and is thus invalid DWARF. Jakub
Re: dwz-0.1 - DWARF compression tool
On Wed, Apr 18, 2012 at 09:49:11AM +0200, Mike Dupont wrote: > this is exciting, thanks for sharing. > > I wonder what amount of data is even the same between many libraries, Of course there is a lot of DWARF duplication in between different libraries, or binaries, or e.g. Linux kernel modules (which have the added problem that they have relocations against the sections; we could apply and remove the relocations against .debug_* sections (and do string merging of .debug_str at the same time) there as first step, but there would be still relocations against the module .text/.data etc.). The problem with that is that we'd need DWARF extensions to do the duplication elimination in between different libraries/binaries. I can think of two possible approaches: 1) indicate somehow that .debug_* sections live elsewhere, in a single (per package?) *.debug object, where all the .debug_* sections would be concatenated together and then just compress the debug info in that large object. The main problem with that is that suddenly all places in the debug info that refer to .text/.data (and other allocated sections) addresses need to be augmented somehow to say which of the possibly many shared libraries or kernel modules or binaries they refer to. That would be too hard. It could be done just by some attribute in each DW_TAG_*_unit saying what that CU refers to (if it uses any addresses anywhere), and other .debug_* sections that are solely referenced from .debug_info would be fine too. But e.g. .debug_aranges would need extensions... 2) or, alternatively, keep most of the debug info in the individual objects (shared libraries, binaries, kernel modules) and just for what dwz currently moves over into new DW_TAG_partial_unit CUs (assuming it doesn't contain any .text/.data references and only refers to DIEs inside of them or in other partial units that don't contain any .text/.data references) move those partial units to a .debug_info section in a separate file (and add some new .debug_* section that would hint the debug info consumers how to find the separate file (build-id, or filename, or combination of both, whatever). If we support just one such separate file, we could just have DW_FORM_alt_sec_offset and DW_FORM_ref_alt_addr new forms, which would mean this is the corresponding .debug{_line,_loc,_loc} section offset, but not inside of this file, but in the secondary file. If we were to support more than one, we'd need to number them and add forms that would say start with uleb128 number index of the separate file followed by actual offset. Still, a shorthand form for the first one separate file might be handy, assuming that is what is done most of the time. With many possibly large binaries/libraries together there are major concerns about memory consumption though, so I think the tool would need to do it in steps - compress each file individually first (what the tool does right now) and for eligible partial units append them to a common separate file (and keep them in the original file too). When the first pass over all files is done, merge duplicates within the common separate file which holds just the partial units. Second pass would then take the reduced common separate file and the compressed debug info from the first pass, and find duplicate partial units, switch references to them in their forms to the alt forms and remove the no longer needed partial units. Of course the separate common file would not need to contain just .debug_info and .debug_abbrev sections, but also some minimal .debug_line section (not containing actual line instructions, but dir/file tables). My preference would be 2). What do you think? Jakub
Re: dwz-0.1 - DWARF compression tool
On Wed, Apr 18, 2012 at 02:26:45PM +0200, Jakub Jelinek wrote: > Of course there is a lot of DWARF duplication in between different > libraries, or binaries, or e.g. Linux kernel modules (which have the > added problem that they have relocations against the sections; we could > apply and remove the relocations against .debug_* sections (and do string > merging of .debug_str at the same time) there as first step, but there would > be still relocations against the module .text/.data etc.). BTW. We do now remove the relocations against .debug_* sections in Fedora (using rpm >= 4.9 and elfutils >= 0.153) and that saves a lot of space: http://lists.fedoraproject.org/pipermail/kernel/2012-February/003665.html "This saves ~500MB on the installed size of the kernel-debuginfo package and makes the rpm file ~30MB smaller" Cheers, Mark
Re: Debug info for comdat functions
On 04/18/2012 07:53 AM, Jakub Jelinek wrote: Consider attached testcase with comdat foo function, seems the current linker behavior (well, tested with 2.21.53.0.1 ld.bfd) is that for DW_TAG_subprogram with DW_AT_low_pc/DW_AT_high_pc having section relative relocs against comdat functions if the comdat text section has the same size in both object files, then DW_AT_low_pc (and DW_AT_high_pc) attributes in both CUs will point to the same range. This seems clearly wrong to me. A reference to a symbol in a discarded section should not resolve to an offset into a different section. I thought the linker always resolved such references to 0, and I think that is what we want. When discussed on IRC recently Jason preferred to move the DW_TAG_subprogram describing a comdat function to a comdat .debug_info DW_TAG_partial_unit and just reference all DIEs that need to be referenced from it using DW_FORM_ref_addr back to the non-comdat .debug_info. I played around with implementing this in the compiler yesterday; my initial patch is attached. It seems that with normal DWARF 4 this can work well, but I ran into issues with various GNU extensions: DW_TAG_GNU_call_site wants to refer to the called function's DIE, so the function die in the separate unit needs to have its own symbol. Perhaps _call_site could refer to the function symbol instead? That seems more correct anyway, since with COMDAT functions you might end up calling a different version of the function that has a different DIE. The typed stack ops such as DW_OP_GNU_deref_type want to refer to a type in the same CU, so we would need to copy any referenced base types into the separate function CU. Could we add variants of these ops that take an offset from .debug_info? Perhaps put its sole .debug_loc contributions into comdat part as well, .debug_ranges maybe too. I haven't done anything with .debug_loc yet. .debug_ranges mostly goes away with this change; the main CU becomes just .text and the separate CUs are just their own function. I suppose .debug_ranges would still be needed with hot/cold optimizations. We would have DW_TAG_imported_unit with DW_AT_import attribute pointing to the start DW_TAG_partial_unit in the section (we would need to hardcode the +11 bytes offset, assuming nobody ever emits 64-bit DWARF) and not refer to any other DIEs from the partial unit. I think it would be both better and more correct to have the DW_AT_imported_unit going the other way, so the function CU imports the main CU. That's what DWARF4 appendix E suggests. My patch doesn't implement this yet. Jason diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index abe3f1b..c113c63 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -8612,6 +8612,7 @@ ix86_code_end (void) NULL_TREE, void_type_node); TREE_PUBLIC (decl) = 1; TREE_STATIC (decl) = 1; + DECL_IGNORED_P (decl) = 1; #if TARGET_MACHO if (TARGET_MACHO) diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 7e2ce58..0c33af2 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -1007,6 +1007,7 @@ dwarf2out_begin_prologue (unsigned int line ATTRIBUTE_UNUSED, fde->dw_fde_current_label = dup_label; fde->in_std_section = (fnsec == text_section || (cold_text_section && fnsec == cold_text_section)); + fde->comdat = DECL_ONE_ONLY (current_function_decl); /* We only want to output line number information for the genuine dwarf2 prologue case, not the eh frame case. */ @@ -3291,8 +3292,10 @@ static void compute_section_prefix (dw_die_ref); static int is_type_die (dw_die_ref); static int is_comdat_die (dw_die_ref); static int is_symbol_die (dw_die_ref); +static int is_abstract_die (dw_die_ref); static void assign_symbol_names (dw_die_ref); static void break_out_includes (dw_die_ref); +static void break_out_comdat_functions (dw_die_ref); static int is_declaration_die (dw_die_ref); static int should_move_die_to_comdat (dw_die_ref); static dw_die_ref clone_as_declaration (dw_die_ref); @@ -4105,6 +4108,9 @@ dwarf_attr_name (unsigned int attr) case DW_AT_GNU_macros: return "DW_AT_GNU_macros"; +case DW_AT_GNU_comdat: + return "DW_AT_GNU_comdat"; + case DW_AT_GNAT_descriptive_type: return "DW_AT_GNAT_descriptive_type"; @@ -6698,6 +6704,9 @@ is_symbol_die (dw_die_ref c) { return (is_type_die (c) || is_declaration_die (c) + || is_abstract_die (c) + /* DW_TAG_GNU_call_site can refer to subprograms. */ + || c->die_tag == DW_TAG_subprogram || c->die_tag == DW_TAG_namespace || c->die_tag == DW_TAG_module); } @@ -6728,6 +6737,8 @@ assign_symbol_names (dw_die_ref die) if (is_symbol_die (die)) { + if (die->die_id.die_symbol) + return; if (comdat_symbol_id) { char *p = XALLOCAVEC (char, strlen (comdat_symbol_id) + 64); @@ -6900,6 +6911,65 @@ break_out_includes (dw_die_ref die) htab_delete (cu_hash_table); } +static c
Re: What do do with the exceptional case of expand_case for SJLJ exceptions
On 04/18/2012 05:39 AM, Jan Hubicka wrote: > Well, if SJLJ lowering happens as gimple pass somewhere near the end of gimple > queue, this should not be problem at all. (and implementation would be > cleaner) If you can find a clean way of separating sjlj expansion from dw2 expansion, please do. But there's a lot of code shared between the two. I see nothing wrong with always expanding via tablejump. r~
Re: Debug info for comdat functions
On Wed, Apr 18, 2012 at 08:43:37AM -0400, Jason Merrill wrote: > On 04/18/2012 07:53 AM, Jakub Jelinek wrote: > >Consider attached testcase with comdat foo function, seems the > >current linker behavior (well, tested with 2.21.53.0.1 ld.bfd) > >is that for DW_TAG_subprogram with DW_AT_low_pc/DW_AT_high_pc > >having section relative relocs against comdat functions > >if the comdat text section has the same size in both object > >files, then DW_AT_low_pc (and DW_AT_high_pc) attributes > >in both CUs will point to the same range. > > This seems clearly wrong to me. A reference to a symbol in a > discarded section should not resolve to an offset into a different > section. I thought the linker always resolved such references to 0, > and I think that is what we want. If the .text (and all other allocated sections) in the comdat group is bitwise identical, I think it isn't a problem to refer to that, it really doesn't matter at that point which object file won owning it. But if it is different, I really think it is a bug not to clear it. > >When discussed on IRC recently Jason preferred to move the DW_TAG_subprogram > >describing a comdat function to a comdat .debug_info DW_TAG_partial_unit > >and just reference all DIEs that need to be referenced from it > >using DW_FORM_ref_addr back to the non-comdat .debug_info. > > I played around with implementing this in the compiler yesterday; my > initial patch is attached. It seems that with normal DWARF 4 this > can work well, but I ran into issues with various GNU extensions: Importing the non-comdat .debug_info from comdat .debug_info is an interesting approach. My slight problem with that is that the debug info no longer describes the input source file 1:1, eventhough the DW_TAG_imported_unit brings in stuff like local variables (so that when settping through that comdat e.g. static variables in the same source file will be visible), other comdat functions in that file won't. But perhaps that is not a big issue, guess it is up to the debug info consumer folks to chime in about that. > DW_TAG_GNU_call_site wants to refer to the called function's DIE, so > the function die in the separate unit needs to have its own symbol. > Perhaps _call_site could refer to the function symbol instead? That > seems more correct anyway, since with COMDAT functions you might end > up calling a different version of the function that has a different > DIE. At this point it is too late to change the specification of the extension. But you could just put in a DW_TAG_subprogram DW_AT_external declaration in the main .debug_info and just refer to that from call_site as well as from DW_AT_specification in the comdat .debug_info. That DW_AT_abstract_origin is meant there to be just one of the possible many DIEs referring to the callee, the debug info consumer is supposed to find out the actual DIE that contains the code from it using its usual mechanisms. Or, for call references to the comdat functions you can drop DW_AT_abstract_origin attribute and instead provide DW_AT_call_site_target with DW_OP_addr . The latter has the disadvantage that the linker will clear it from time to time (if its .text.* section size is different). > The typed stack ops such as DW_OP_GNU_deref_type want to refer to a > type in the same CU, so we would need to copy any referenced base > types into the separate function CU. Could we add variants of these > ops that take an offset from .debug_info? The DW_TAG_base_type is small enough that we can duplicate it, in the dup we actually could drop DW_AT_name (i.e. keep it as is in the main .debug_info and in the comdat just provide DW_AT_encoding/DW_AT_byte_size). That is 3 bytes for the base type, small enough that it offsets for the smaller uleb128 sizes. Jakub
Re: Why does lower-subreg mark copied pseudos as "decomposable"?
Andrew Stubbs writes: > On 18/04/12 11:55, Richard Sandiford wrote: >> The problem is that not all register moves are always going to be >> eliminated, even when no mode changes are involved. It might make >> sense to restrict that code you quoted: >> >> case SIMPLE_PSEUDO_REG_MOVE: >>if (MODES_TIEABLE_P (GET_MODE (x), word_mode)) >> bitmap_set_bit (decomposable_context, regno); >>break; >> >> to the second pass though. > > Yes, I thought of that, but I dismissed it because the second pass is > really very late. It would be just in time to take advantage of the > relaxed register allocation, but would miss out on all the various > optimizations that forward-propagation, combining, and such can offer. > > This is why I've tried to find a way to do something about it in the > first pass. I thought it makes sense to do something for none-no-op > moves (when is there such a thing, btw, without it being and extend, > truncate, or subreg?), AFAIK there isn't, which is why I'm a bit unsure what you're suggesting. Different modes like DI and DF can both be stored in NEON registers, so if you have a situation where one is punned into the other, I think that's an even stronger reason to want to keep them together. > but the no-op moves are trickier. > > Perhaps a combination of the two ideas? Decompose mode-changing moves in > the first pass, and all moves in the second? > > BTW, the lower-subreg pass has a forward propagation concept of its own. > If I read it right, even with the above changes, it will still decompose > the move if the register it copies from has been decomposed, and the > register it copies to is not marked 'non-decomposable'. Right. > Hmm, I'm going to try to come up with some testcases that demonstrate > the different cases and see if that helps me think about it. Do you > happen to have any to hand? 'Fraid not, sorry. >> OK. If/when that patches goes in, the ARM backend is going to have >> to pick an rtx cost for DImode SETs. It sounds like the cost will need >> to be twice an SImode move regardless of whether or not NEON is enabled. > > That sounds reasonable. Of course, how much a register move costs is a > tricky subject for NEON anyway. :( Yeah. Richard
Re: Why does lower-subreg mark copied pseudos as "decomposable"?
On 18/04/12 16:53, Richard Sandiford wrote: Andrew Stubbs writes: On 18/04/12 11:55, Richard Sandiford wrote: The problem is that not all register moves are always going to be eliminated, even when no mode changes are involved. It might make sense to restrict that code you quoted: case SIMPLE_PSEUDO_REG_MOVE: if (MODES_TIEABLE_P (GET_MODE (x), word_mode)) bitmap_set_bit (decomposable_context, regno); break; to the second pass though. Yes, I thought of that, but I dismissed it because the second pass is really very late. It would be just in time to take advantage of the relaxed register allocation, but would miss out on all the various optimizations that forward-propagation, combining, and such can offer. This is why I've tried to find a way to do something about it in the first pass. I thought it makes sense to do something for none-no-op moves (when is there such a thing, btw, without it being and extend, truncate, or subreg?), AFAIK there isn't, which is why I'm a bit unsure what you're suggesting. And why I don't understand what the current code is trying to achieve. Different modes like DI and DF can both be stored in NEON registers, so if you have a situation where one is punned into the other, I think that's an even stronger reason to want to keep them together. Does the compiler use pseudo-reg copies for that? I thought it mostly just referred to the same register with a different mode and everything just DTRT. OK, let's go back to the start: at first sight, the lower-subregs pass decomposes every psuedo-register that is larger than a core register, is only defined or used via subreg or a simple copy, or is a copy of a decomposed register that has no non-decomposable features itself (forward propagation). It does not deliberately decompose pseudo-registers that are only copies from or to a hard-register, even though there's nothing intrinsically non-decomposable about that (besides that there's no benefit), but it can happen if forward propagation occurs. It explicitly does not decompose any pseudo that is used in a non-move DImode operation. All this makes sense to me: if the backend is written such that DImode operations are expanded in terms of SImode subregs, then it's better to think of the subregs independently. (On ARM, this *is* the case when NEON is disabled.) But then there's this extra "feature" that a pseudo-to-pseudo copy triggers both pseudo registers to be considered decomposable (unless there's some other use that prohibits it), and I don't know why? Yes, I understand that a move from NEON to core might benefit from this, but those don't exist before reload. I also theorized that moves that convert to some other kind of mode might be interesting (the existing code checks for "tieable" modes, presumable with reason), but I can't come up with a valid example (mode changes usually require a non-move operation of some kind). In fact, the only examples of a pseudo-pseudo copy that won't be eliminated by fwprop et al would be to do with loops and conditionals, and I don't understand why they should be special. The result of this extra feature is that if I copy the output of a DImode insn *directly* to a DImode hard reg (say a return value) then there's no decomposition, but if the expand pass happens to have put an intermediate pseudo register (as it does do) then this extra rule decomposes it most unhelpfully (ok, there's only actually a problem if the compiler can reason that one subreg or the other is unchanged, as is the case with sign_extend). So, after having thought all this through again, unless somebody can show why not, I propose that we remove this mis-feature entirely, or at least disable it in the first pass. Andrew
Re: Why does lower-subreg mark copied pseudos as "decomposable"?
Andrew Stubbs writes: > On 18/04/12 16:53, Richard Sandiford wrote: >> Andrew Stubbs writes: >>> On 18/04/12 11:55, Richard Sandiford wrote: The problem is that not all register moves are always going to be eliminated, even when no mode changes are involved. It might make sense to restrict that code you quoted: case SIMPLE_PSEUDO_REG_MOVE: if (MODES_TIEABLE_P (GET_MODE (x), word_mode)) bitmap_set_bit (decomposable_context, regno); break; to the second pass though. >>> >>> Yes, I thought of that, but I dismissed it because the second pass is >>> really very late. It would be just in time to take advantage of the >>> relaxed register allocation, but would miss out on all the various >>> optimizations that forward-propagation, combining, and such can offer. >>> >>> This is why I've tried to find a way to do something about it in the >>> first pass. I thought it makes sense to do something for none-no-op >>> moves (when is there such a thing, btw, without it being and extend, >>> truncate, or subreg?), >> >> AFAIK there isn't, which is why I'm a bit unsure what you're suggesting. > > And why I don't understand what the current code is trying to achieve. See below. >> Different modes like DI and DF can both be stored in NEON registers, >> so if you have a situation where one is punned into the other, >> I think that's an even stronger reason to want to keep them together. > > Does the compiler use pseudo-reg copies for that? I thought it mostly > just referred to the same register with a different mode and everything > just DTRT. > > OK, let's go back to the start: at first sight, the lower-subregs pass > decomposes every psuedo-register that is larger than a core register, is > only defined or used via subreg or a simple copy, or is a copy of a > decomposed register that has no non-decomposable features itself > (forward propagation). It does not deliberately decompose > pseudo-registers that are only copies from or to a hard-register, even > though there's nothing intrinsically non-decomposable about that > (besides that there's no benefit), but it can happen if forward > propagation occurs. It explicitly does not decompose any pseudo that is > used in a non-move DImode operation. > > All this makes sense to me: if the backend is written such that DImode > operations are expanded in terms of SImode subregs, then it's better to > think of the subregs independently. (On ARM, this *is* the case when > NEON is disabled.) > > But then there's this extra "feature" that a pseudo-to-pseudo copy > triggers both pseudo registers to be considered decomposable (unless > there's some other use that prohibits it), and I don't know why? > > Yes, I understand that a move from NEON to core might benefit from this, > but those don't exist before reload. I also theorized that moves that > convert to some other kind of mode might be interesting (the existing > code checks for "tieable" modes, presumable with reason), but I can't > come up with a valid example (mode changes usually require a non-move > operation of some kind). > > In fact, the only examples of a pseudo-pseudo copy that won't be > eliminated by fwprop et al would be to do with loops and conditionals, > and I don't understand why they should be special. Not just those, because loads, stores, calls, volatiles, etc., can't be moved freely. E.g. code like: uint64_t foo (uint64_t *x, uint64_t z) { uint64_t y = *x; *x = z; return y; } benefits too, because y must be a pseudo. I don't think the idea is that these cases are special in themselves. What we're looking for are pseudos that _may_ be decomposed into separate registers. If one of the pseudos in the move is only used in decomposable contexts (including nonvolatile loads and stores, as well as copies to and from hard registers, etc.), then we may be able to completely replace the original pseudo with two smaller ones. E.g.: (set (reg:DI X) (mem:DI ...)) ... (set (reg:DI Y) (reg:DI X)) In this case, X can be completely replaced by two SImode registers. What isn't clear to me is why we don't seem to do the same for: (set (reg:DI X) (mem:DI ...)) (set (mem:DI ...) (reg:DI X)) Perhaps we do and I'm just misreading the code. Or perhaps it's just too hard to get the costs right. Splitting that would be moving even further from what you want though :-) > The result of this extra feature is that if I copy the output of a > DImode insn *directly* to a DImode hard reg (say a return value) then > there's no decomposition, but if the expand pass happens to have put an > intermediate pseudo register (as it does do) then this extra rule > decomposes it most unhelpfully (ok, there's only actually a problem if > the compiler can reason that one subreg or the other is unchanged, as is > the case with sign_extend). But remember that this pass is not
GIT Mirror Down?
Hello Everyone, Is the GIT mirror for GCC down? I tried clicking on the snapshot link near a commit and it is timing out. Thanks, Balaji V. Iyer.
Re: Debug info for comdat functions
> This seems clearly wrong to me. A reference to a symbol in a discarded > section should not resolve to an offset into a different section. I thought > the linker always resolved such references to 0, and I think that is what we > want. Even resolving to 0 can cause problems. In the Gnu linker, all references to a discarded symbol get relocated to 0, ignoring any addend. This can result in spurious (0,0) pairs in range lists. In Gold, we treat the discarded symbol as 0, but still apply the addend, and count on GDB to recognize that the function starting at 0 must have been discarded. Neither solution is ideal. That's why debug info for COMDAT functions ought to be in the same COMDAT group as the function... >> When discussed on IRC recently Jason preferred to move the >> DW_TAG_subprogram >> describing a comdat function to a comdat .debug_info DW_TAG_partial_unit >> and just reference all DIEs that need to be referenced from it >> using DW_FORM_ref_addr back to the non-comdat .debug_info. > > I played around with implementing this in the compiler yesterday; my initial > patch is attached. It seems that with normal DWARF 4 this can work well, > but I ran into issues with various GNU extensions: Nice -- I've been wanting to do that for a while, but I always thought it would be a lot harder. I see that you've based this on the infrastructure created for -feliminate-dwarf2-dups. I don't think that will play nice with -fdebug-types-section, though, since I basically made those two options incompatible with each other by unioning die_symbol with die_type_node. In the HP-UX compilers, we basically put a complete set of .debug_* sections in each COMDAT group, and treated the group as a compilation unit of its own (not a partial unit). That worked well, and avoided some of the problems you're running into (although clearly is more wasteful in terms of object file size). Readelf and friends will need to be taught how to find the right auxiliary debug sections, though -- they currently have a built-in assumption that there's only one of each. -cary
Re: GIT Mirror Down?
"Iyer, Balaji V" writes: > Is the GIT mirror for GCC down? I tried clicking on the snapshot > link near a commit and it is timing out. It could be that generating the snapshot is taking more CPU time than the web server is configured to permit. Consider making your own git clone, and generate snapshot tarballs from that. - FChE
Re: Debug info for comdat functions
On 04/18/2012 07:40 PM, Cary Coutant wrote: Nice -- I've been wanting to do that for a while, but I always thought it would be a lot harder. I see that you've based this on the infrastructure created for -feliminate-dwarf2-dups. I don't think that will play nice with -fdebug-types-section, though, since I basically made those two options incompatible with each other by unioning die_symbol with die_type_node. I think it should be OK because I wait until after the debug_types processing is done, at which point limbo_die_list is empty. Or am I just not seeing the problem? In the HP-UX compilers, we basically put a complete set of .debug_* sections in each COMDAT group, and treated the group as a compilation unit of its own (not a partial unit). So you copy anything that the function refers to into the CU for the function? wasteful in terms of object file size). Readelf and friends will need to be taught how to find the right auxiliary debug sections, though -- they currently have a built-in assumption that there's only one of each. Good to know. Jason
Re: Debug info for comdat functions
On Wed, Apr 18, 2012 at 03:23:35PM +0200, Jakub Jelinek wrote: > > DW_TAG_GNU_call_site wants to refer to the called function's DIE, so > > the function die in the separate unit needs to have its own symbol. > > Perhaps _call_site could refer to the function symbol instead? That > > seems more correct anyway, since with COMDAT functions you might end > > up calling a different version of the function that has a different > > DIE. > > At this point it is too late to change the specification of the extension. > But you could just put in a DW_TAG_subprogram DW_AT_external declaration > in the main .debug_info and just refer to that from call_site as well > as from DW_AT_specification in the comdat .debug_info. That > DW_AT_abstract_origin is meant there to be just one of the possible many > DIEs referring to the callee, the debug info consumer is supposed to find > out the actual DIE that contains the code from it using its usual > mechanisms. That could be easily done by keeping around the original die_node of the DW_TAG_subprogram for comdat in the main CU, create a new die_node in the comdat unit and move all (or all but formal_parameter?) children to it and copy/move attributes. Thus all references to the subprogram would go to the main .debug_info. Jakub