Re: Local Build ID Directory Lookup (DEBUGINFOD_LOCAL_PATH)

2023-06-19 Thread Mark Wielaard
Hi Roland,

On Wed, 2023-06-14 at 12:43 -0700, Roland McGrath wrote:
> Personally I'm not concerned with any non-build-ID use cases any more.
> I don't know if the rest of the world is OK with presuming that build
> ID-based lookup is always the only thing you want nowadays.
> But it seems plausible, since we rolled out Build ID in 2008 and it's
> been pretty thoroughly adopted by now.

Agreed. And if an application uses libdebuginfod it kind of assumes
build-ids are always there anyway. Because there is no interface for
requesting something without a build-id :)

So that leaves the wrinkle of finding the .dwz files. And how to
connect the debuginfo to the debugsources. When pointing to a
"traditional" build-id based directory like /usr/lib/debug/ and
/usr/src/debug/

Cheers,

Mark


Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.

2023-06-19 Thread Mark Wielaard
Hi Romain,

Just to let you know I am looking at this. But haven't made much
progress in understanding it yet. Thanks so much for the reproducer. I
have been able to see the (very slow) parsing of the core file with it.

$ time ./mimic-systemd-coredump
[...]
real3m35.965s
user0m0.722s
sys 0m0.345s

Note however that a lot of time is "missing".
And in fact running it again is fast!?!

$ time ./mimic-systemd-coredump
real0m0.327s
user0m0.272s
sys 0m0.050s

This is because of the kernel inode/dentry cache.
If I do $ echo 2 | sudo tee /proc/sys/vm/drop_caches
before running ./mimic-systemd-coredump it is always slow.

I'll try to figure out what we do to make it so hard for the kernel to
do these lookups.

But that doesn't invalidate the other observation you made, that the
dwfl_module_get_elf call always returns NULL.

> My understanding of the will of systemd developers is that they hoped that 
> libdwfl would
> return some "partial" Elf* reference when calling dwfl_module_getelf, from 
> the elf
> headers found in the core for each and every shared library (the first page 
> of the
> PT_LOAD mappings that the kernel always dumps even when the mapping is file 
> backed).

Right, that is a reasonable hope. And I don't actually know why it
always fails in this case.

> However it seems that behind the hood it doesn't (is it linked to 
> core_file_read_eagerly
> which seems to always return false in this case ?), and instead it uses the
> .find_elf = dwfl_build_id_find_elf callback which tries to find the file by 
> buildid
> on the filesystem. For some unknown reason to me, calling dwfl_module_getelf 
> is very
> slow (I wouldn't expect that looking on the filesytem by buildid is that slow 
> actually).

Apparently we do it in some really slow way if the inodes/dentries
aren't in kernel cache (and the files are not actually on disk).

Which does bring up the question why systemd-coredump isn't running in
the same mount space as the crashing program. Then it would simply find
the files that the crashing program is using. Or it might install a
.find_elf callback that (also) looks under /proc/pid/root/ ?

> So, is this behavior of dwfl_module_getelf expected ? If yes, do you agree 
> that we shall
> advise systemd-coredump developer to invert their logic, to first try to look 
> for partial
> elf header from the core's PT_LOAD section, then only fallback to the more 
> reliable
> dwfl_module_getelf if it didn't work ? In practice, we have tried the 
> following patch
> applied to systemd v253 and it seems ot "fix" the above mentionned case:

I don't think dwfl_module_getelf should always return NULL in this
case. Nor should it be this slow. But given that it does and given that
it is slow that is certainly reasonable advise.

> Some other side question: on the long run, wouldn't it make sense that 
> elfutils tries to parse the
> json package metadata section by itself, just like it does for the buildid, 
> rather than implementing
> this logic in systemd ?

Maybe we could provide this functionality. You are right that we have
no problem getting the build-ids with $ eu-unstrip --core=./the-core -n
So providing some other "static data" might be possible with a simpler
interface.

Thanks for this extensive bug report and reproducer. I play some more
with it to hopefully get you some real answers/fixes.

Cheers,

Mark


Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.

2023-06-19 Thread Romain GEISSLER via Elfutils-devel
> Le 19 juin 2023 à 17:08, Mark Wielaard  a écrit :
> 
> Hi Romain,
> 
> Just to let you know I am looking at this. But haven't made much
> progress in understanding it yet. Thanks so much for the reproducer. I
> have been able to see the (very slow) parsing of the core file with it.

Hi,

Thanks ! And sorry that Laurent had pinged you directly on Slack, I
wanted to reach you via this mailing list instead of through the Red
Hat customer network ;)

I don’t know if you read the Red Hat case too. There you can find
things a bit more clarified, and splitted into what I think are potentially
3 distinct "problems" which 3 distinct possible fix. Since there is nothing
private, I can write on this here as well on this public mailing list.

So in the end I see 3 points (in addition to not understanding why
finding the elf header returns NULL while it should not and which I
guess you are currently looking at):
 - the idea that systemd developers should invert their logic: first
try to parse elf/program headers from the (maybe partial) core dump
PT_LOAD program headers
 - This special "if" condition that I have added in the original systemd
code:

+/* This PT_LOAD section doesn't contain the start address, so 
it can't be the module we are looking for. */
+if (start < program_header->p_vaddr || start >= 
program_header->p_vaddr + program_header->p_memsz)
+continue;

to be added near this line: 
https://github.com/systemd/systemd/blob/72e7bfe02d7814fff15602726c7218b389324159/src/shared/elf-util.c#L540

on which I would like to ask you if indeed it seems like a "right" fix with
your knowledge of how core dump and elf files are shaped.
 - The idea that maybe this commit 
https://sourceware.org/git/?p=elfutils.git;a=commitdiff;h=8db849976f07046d27b4217e9ebd08d5623acc4f
which assumed that normally the order of magnitude of program headers
is 10 for a "normal" elf file, so a linked list would be enough might be
wrong in the special case of core dump which may have much more
program headers. And if indeed it makes sense to elf_getdata_rawchunk
for each and every program header of a core, in that case should this
linked list be changed into some set/hashmap indexed by start
address/size ?

> 
> $ time ./mimic-systemd-coredump
> [...]
> real3m35.965s
> user0m0.722s
> sys 0m0.345s
> 
> Note however that a lot of time is "missing".
> And in fact running it again is fast!?!
> 
> $ time ./mimic-systemd-coredump
> real0m0.327s
> user0m0.272s
> sys 0m0.050s
> 
> This is because of the kernel inode/dentry cache.
> If I do $ echo 2 | sudo tee /proc/sys/vm/drop_caches
> before running ./mimic-systemd-coredump it is always slow.

Interesting ! I didn’t see that (actually I never let the program run till the
end !).

> Which does bring up the question why systemd-coredump isn't running in
> the same mount space as the crashing program. Then it would simply find
> the files that the crashing program is using.

On this point that systemd-coredump might not run in the same mount
namespace, don’t blindly believe me. I think I saw this while reviewing the
systemd code, but it was the first time I looked at it to investigate this 
issue,
so may be wrong. But I am sure you have access to some systemd
colleagues at Red Hat to double-check the details ;)

Cheers,
Romain

Re: Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.

2023-06-19 Thread Luca Boccassi
> > Which does bring up the question why systemd-coredump isn't running
> in
> > the same mount space as the crashing program. Then it would simply
> find
> > the files that the crashing program is using.
> 
> On this point that systemd-coredump might not run in the same mount
> namespace, don’t blindly believe me. I think I saw this while
> reviewing the
> systemd code, but it was the first time I looked at it to investigate
> this issue,
> so may be wrong.

This is correct, in case of containers sd-coredump will run on the host
and collect from all the guests, so they are going to be in different
namespaces. And even when they are not, the original binary might be
long gone by the time it has a chance to run.

-- 
Kind regards,
Luca Boccassi


signature.asc
Description: This is a digitally signed message part