On Fri, Oct 25, 2019 at 1:52 AM Indu Bhagat <[email protected]> wrote:
>
>
>
> On 10/11/2019 04:41 AM, Jakub Jelinek wrote:
> > On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote:
> >>> (coreutils-0.22)
> >>> .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf
> >>> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> >>> ls 30616 | 1136 | 21098 | 26240
> >>> | 0.62
> >>> pwd 10734 | 788 | 10433 | 13929
> >>> | 0.83
> >>> groups 10706 | 811 | 10249 | 13378
> >>> | 0.80
> >>>
> >>> (emacs-26.3)
> >>> .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf
> >>> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> >>> emacs-26.3.1 674657 | 6402 | 273963 | 273910
> >>> | 0.33
> >>>
> >>> I chose to account for 50% of .debug_str because at this point, it will be
> >>> unfair to not account for them. Actually, one could even argue that upto
> >>> 70%
> >>> of the .debug_str are names of entities. CTF section sizes do include the
> >>> CTF
> >>> string tables.
> >>>
> >>> Across coreutils, I see a geomean of 0.73 (ratio of
> >>> .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
> >>> "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
> >>> footprint than CTF (with 50% of .debug_str accounted for).
> >> I'm not convinced this "improvement" in size is worth maintainig another
> >> debug-info format much less since it lacks desirable features right now
> >> and thus evaluation is tricky.
> >>
> >> At least you can improve dwarf size considerably with a low amount of work.
> >>
> >> I suspect another factor where dwarf is bigger compared to CTF is that
> >> dwarf
> >> is recording typedef names as well as qualified type variants. But maybe
> >> CTF just has a more compact representation for the bits it actually
> >> implements.
> > Does CTF record automatic variables in functions, or just global variables?
> > If only the latter, it would be fair to also disable addition of local
> > variable DIEs, lexical blocks. Does CTF record inline functions? Again, if
> > not, it would be fair to not emit that either in .debug_info.
> > -gno-record-gcc-switches so that the compiler command line is not encoded in
> > the debug info (unless it is in CTF).
>
> CTF includes file-scope and global-scope entities. So, CTF for a function
> defined/declared at these scopes is available in .ctf section, even if it is
> inlined.
>
> To not generate DWARF for function-local entities, I made a tweak in the
> gen_decl_die API to have an early exit when TREE_CODE (DECL_CONTEXT (decl))
> is FUNCTION_DECL.
>
> @@ -26374,6 +26374,12 @@ gen_decl_die (tree decl, tree origin, struct
> vlr_context *ctx,
> if (DECL_P (decl_or_origin) && DECL_IGNORED_P (decl_or_origin))
> return NULL;
>
> + /* Do not generate info for function local decl when -gdwarf-like-ctf is
> + enabled. */
> + if (debug_dwarf_like_ctf && DECL_CONTEXT (decl)
> + && (TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL))
> + return NULL;
> +
> switch (TREE_CODE (decl_or_origin))
> {
> case ERROR_MARK:
A better place is probably in gen_subprogram_die, returning early before
/* Output Dwarf info for all of the stuff within the body of the function
(if it has one - it may be just a declaration).
note we also emit DIEs for [optionally also unused, if requested] function
declarations without actual definitions, I would guess CTF doesn't since
there's no symbol table entry for those. Plus we by default prune types
that are not used. So
struct S { int i; };
extern void foo (struct S *);
void bar()
{
struct S s;
foo (&s);
}
would have DIEs for S and foo in addition to that for bar. To me it seems
those are not relevant for function entry point inspection (eventually both
S and foo have CTF info in the defining unit). Correct?
Richard.
>
> For the numbers in the email today:
> 1. CFLAGS="-g -gdwarf-like-ctf -gno-record-gcc-switches -O2". dwz is used on
> generated binaries.
> 2. At this time, I wanted to account for .debug_str entities appropriately
> (not
> 50% as done previously). Using a small script to count chars for
> accounting the "path-like" strings, specifically those strings that start
> with a ".", I gathered the data in column named D5.
>
> (coreutils-0.22)
> .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings
> (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5))
> ls 14100 | 994 | 16945 | 1328
> | 26240 | 0.85
> pwd 6341 | 632 | 9311 | 596
> | 13929 | 0.88
> groups 6410 | 714 | 9218 | 667
> | 13378 | 0.85
> Average geomean across coreutils = 0.84
>
> (emacs-26.3)
> .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings
> (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5))
> emacs-26.3.1 373678 | 3794 | 219048 | 3842
> | 273910 | 0.46
>
> > DWARF is highly extensible format, what exactly is and is not emitted is
> > something that consumers can choose.
> > Yes, DWARF can be large, but mainly because it provides a lot of
> > information, the actual representation has been designed with size concerns
> > in mind and newer versions of the standard keep improving that too.
> >
> > Jakub
>
> Yes.
>
> I started out to provide some numbers around the size impact of CTF vs DWARF
> as it was a legitimate curiosity many of us have had. Comparing Compactness or
> feature matrices is only one dimension of evaluating the utility of supporting
> CTF in the toolchain (including GCC; Bintuils and GDB have already accepted
> initial CTF support). The other dimension is a user friendly workflow which
> supports current users and eases further adoption and growth.
>
> Indu
>