Re: Local Build ID Directory Lookup (DEBUGINFOD_LOCAL_PATH)
Hi Roland, On Wed, 2023-06-14 at 12:43 -0700, Roland McGrath wrote: > Personally I'm not concerned with any non-build-ID use cases any more. > I don't know if the rest of the world is OK with presuming that build > ID-based lookup is always the only thing you want nowadays. > But it seems plausible, since we rolled out Build ID in 2008 and it's > been pretty thoroughly adopted by now. Agreed. And if an application uses libdebuginfod it kind of assumes build-ids are always there anyway. Because there is no interface for requesting something without a build-id :) So that leaves the wrinkle of finding the .dwz files. And how to connect the debuginfo to the debugsources. When pointing to a "traditional" build-id based directory like /usr/lib/debug/ and /usr/src/debug/ Cheers, Mark
Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.
Hi Romain, Just to let you know I am looking at this. But haven't made much progress in understanding it yet. Thanks so much for the reproducer. I have been able to see the (very slow) parsing of the core file with it. $ time ./mimic-systemd-coredump [...] real3m35.965s user0m0.722s sys 0m0.345s Note however that a lot of time is "missing". And in fact running it again is fast!?! $ time ./mimic-systemd-coredump real0m0.327s user0m0.272s sys 0m0.050s This is because of the kernel inode/dentry cache. If I do $ echo 2 | sudo tee /proc/sys/vm/drop_caches before running ./mimic-systemd-coredump it is always slow. I'll try to figure out what we do to make it so hard for the kernel to do these lookups. But that doesn't invalidate the other observation you made, that the dwfl_module_get_elf call always returns NULL. > My understanding of the will of systemd developers is that they hoped that > libdwfl would > return some "partial" Elf* reference when calling dwfl_module_getelf, from > the elf > headers found in the core for each and every shared library (the first page > of the > PT_LOAD mappings that the kernel always dumps even when the mapping is file > backed). Right, that is a reasonable hope. And I don't actually know why it always fails in this case. > However it seems that behind the hood it doesn't (is it linked to > core_file_read_eagerly > which seems to always return false in this case ?), and instead it uses the > .find_elf = dwfl_build_id_find_elf callback which tries to find the file by > buildid > on the filesystem. For some unknown reason to me, calling dwfl_module_getelf > is very > slow (I wouldn't expect that looking on the filesytem by buildid is that slow > actually). Apparently we do it in some really slow way if the inodes/dentries aren't in kernel cache (and the files are not actually on disk). Which does bring up the question why systemd-coredump isn't running in the same mount space as the crashing program. Then it would simply find the files that the crashing program is using. Or it might install a .find_elf callback that (also) looks under /proc/pid/root/ ? > So, is this behavior of dwfl_module_getelf expected ? If yes, do you agree > that we shall > advise systemd-coredump developer to invert their logic, to first try to look > for partial > elf header from the core's PT_LOAD section, then only fallback to the more > reliable > dwfl_module_getelf if it didn't work ? In practice, we have tried the > following patch > applied to systemd v253 and it seems ot "fix" the above mentionned case: I don't think dwfl_module_getelf should always return NULL in this case. Nor should it be this slow. But given that it does and given that it is slow that is certainly reasonable advise. > Some other side question: on the long run, wouldn't it make sense that > elfutils tries to parse the > json package metadata section by itself, just like it does for the buildid, > rather than implementing > this logic in systemd ? Maybe we could provide this functionality. You are right that we have no problem getting the build-ids with $ eu-unstrip --core=./the-core -n So providing some other "static data" might be possible with a simpler interface. Thanks for this extensive bug report and reproducer. I play some more with it to hopefully get you some real answers/fixes. Cheers, Mark
Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.
> Le 19 juin 2023 à 17:08, Mark Wielaard a écrit : > > Hi Romain, > > Just to let you know I am looking at this. But haven't made much > progress in understanding it yet. Thanks so much for the reproducer. I > have been able to see the (very slow) parsing of the core file with it. Hi, Thanks ! And sorry that Laurent had pinged you directly on Slack, I wanted to reach you via this mailing list instead of through the Red Hat customer network ;) I don’t know if you read the Red Hat case too. There you can find things a bit more clarified, and splitted into what I think are potentially 3 distinct "problems" which 3 distinct possible fix. Since there is nothing private, I can write on this here as well on this public mailing list. So in the end I see 3 points (in addition to not understanding why finding the elf header returns NULL while it should not and which I guess you are currently looking at): - the idea that systemd developers should invert their logic: first try to parse elf/program headers from the (maybe partial) core dump PT_LOAD program headers - This special "if" condition that I have added in the original systemd code: +/* This PT_LOAD section doesn't contain the start address, so it can't be the module we are looking for. */ +if (start < program_header->p_vaddr || start >= program_header->p_vaddr + program_header->p_memsz) +continue; to be added near this line: https://github.com/systemd/systemd/blob/72e7bfe02d7814fff15602726c7218b389324159/src/shared/elf-util.c#L540 on which I would like to ask you if indeed it seems like a "right" fix with your knowledge of how core dump and elf files are shaped. - The idea that maybe this commit https://sourceware.org/git/?p=elfutils.git;a=commitdiff;h=8db849976f07046d27b4217e9ebd08d5623acc4f which assumed that normally the order of magnitude of program headers is 10 for a "normal" elf file, so a linked list would be enough might be wrong in the special case of core dump which may have much more program headers. And if indeed it makes sense to elf_getdata_rawchunk for each and every program header of a core, in that case should this linked list be changed into some set/hashmap indexed by start address/size ? > > $ time ./mimic-systemd-coredump > [...] > real3m35.965s > user0m0.722s > sys 0m0.345s > > Note however that a lot of time is "missing". > And in fact running it again is fast!?! > > $ time ./mimic-systemd-coredump > real0m0.327s > user0m0.272s > sys 0m0.050s > > This is because of the kernel inode/dentry cache. > If I do $ echo 2 | sudo tee /proc/sys/vm/drop_caches > before running ./mimic-systemd-coredump it is always slow. Interesting ! I didn’t see that (actually I never let the program run till the end !). > Which does bring up the question why systemd-coredump isn't running in > the same mount space as the crashing program. Then it would simply find > the files that the crashing program is using. On this point that systemd-coredump might not run in the same mount namespace, don’t blindly believe me. I think I saw this while reviewing the systemd code, but it was the first time I looked at it to investigate this issue, so may be wrong. But I am sure you have access to some systemd colleagues at Red Hat to double-check the details ;) Cheers, Romain
Re: Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.
> > Which does bring up the question why systemd-coredump isn't running > in > > the same mount space as the crashing program. Then it would simply > find > > the files that the crashing program is using. > > On this point that systemd-coredump might not run in the same mount > namespace, don’t blindly believe me. I think I saw this while > reviewing the > systemd code, but it was the first time I looked at it to investigate > this issue, > so may be wrong. This is correct, in case of containers sd-coredump will run on the host and collect from all the guests, so they are going to be in different namespaces. And even when they are not, the original binary might be long gone by the time it has a chance to run. -- Kind regards, Luca Boccassi signature.asc Description: This is a digitally signed message part