Hi Romain,
On Tue, 2023-06-20 at 22:05 +, Romain GEISSLER wrote:
>
> Our real use case happens on a Openshift 4.13 node, so the OS is Red Hat Core
> OS 9 (which I assume shares a lot of foundations with RHEL 9).
>
> On our side Francois also told me this afternoon that he didn’t really
> reproduce the same thing with my reproducer posted here and the real
> systemd-coredump issue he witnessed live, and also noticed that with
> DEBUGINFOD_URLS unset/set to the empty string my reproducer has no problem
> anymore. What he witnessed on the real case (using perf/gdb) was that
> apparently lots of time was spend in elf_getdata_rawchunk and often in this
> kind of stack:
>
> Samples: 65K of event 'cpu-clock:pppH', Event count (approx.): 1646850
>
>
> Overhead Command Shared Object Symbol
>
>
> 98.24% (sd-parse-elf) libelf-0.188.so [.] elf_getdata_rawchunk
> 0.48% (sd-parse-elf) libelf-0.188.so [.] 0x48a3
> 0.27% (sd-parse-elf) libelf-0.188.so [.] gelf_getphdr
> 0.11% (sd-parse-elf) libc.so.6 [.] _int_malloc
> 0.10% (sd-parse-elf) libelf-0.188.so [.] gelf_getnote
> 0.06% (sd-parse-elf) libc.so.6 [.] __libc_calloc
> 0.05% (sd-parse-elf) [kernel.kallsyms] [k]
> __softirqentry_text_start
> 0.05% (sd-parse-elf) libc.so.6 [.] _int_free
>
>
> (gdb) bt
> #0 0x7f0ba8a88194 in elf_getdata_rawchunk () from
> target:/lib64/libelf.so.1
> #1 0x7f0ba98e5013 in module_callback.lto_priv () from
> target:/usr/lib64/systemd/libsystemd-shared-252.so
> #2 0x7f0ba8ae7291 in dwfl_getmodules () from target:/lib64/libdw.so.1
> #3 0x7f0ba98e6dc0 in parse_elf_object () from
> target:/usr/lib64/systemd/libsystemd-shared-252.so
> #4 0x562c474f2d5e in submit_coredump ()
> #5 0x562c474f57d1 in process_socket.constprop ()
> #6 0x562c474efbf8 in main ()
>
> My reproducer actually doesn’t fully re-implement what systemd implements
> (the parsing of the package metadata is clearly omitted), so I thought I had
> reproduced the same problem while apparently I didn’t, sorry for that. We
> will also have to double check if really just using 2000 dummy libraries is
> enough or if this also needs to have a more complex binary like we have in
> our real case.
>
> Tomorrow on our side we will have to play a bit with a local build of
> systemd-coredump and try to run it manually to better understand what’s going
> wrong.
>
Seeing those performance results I understand why you were suspecting
the linked list data structure used in elf_getdata_rawchunk.
Would you be able to test a rebuild libelf with the attached patch,
which replaces that datastructure with a binary search tree?
It didn't really show much speedup locally (in the noise, maybe 0.01
sec faster on ~0.25 sec run). But if there are more than 2000 calls to
elf_getdata_rawchunk it should make things faster.
Thanks,
Mark
From 3aca5b5f1f1617db2220022d9061dcaf129e54c4 Mon Sep 17 00:00:00 2001
From: Mark Wielaard
Date: Wed, 21 Jun 2023 18:05:12 +0200
Subject: [PATCH] libelf: Replace list of elf_getdata_rawchunk results with a
tree
elf_getdata_rawchunks did a linear search to see if a chunk was
already fetched. Replace this list with a binary search tree to make
lookup faster when a lot of Elf_Data_Chunk were created.
* libelf/libelfP.h (Elf_Data_Chunk): Remove next field.
(struct Elf): Change the rawchunks type from Elf_Data_Chunk *
to void *.
* elf_getdata_rawchunk.c (chunk_compare): New static function.
(elf_getdata_rawchunk): Use tsearch instead of a manual linked
list.
* elf_end.c (free_chunk): New static function.
(elf_end): Call tdestroy instead of walking linked list.
Signed-off-by: Mark Wielaard
---
libelf/elf_end.c | 22 +---
libelf/elf_getdata_rawchunk.c | 47 +--
libelf/libelfP.h | 13 --
3 files changed, 52 insertions(+), 30 deletions(-)
diff --git a/libelf/elf_end.c b/libelf/elf_end.c
index 5c451f36..3e5d4c86 100644
--- a/libelf/elf_end.c
+++ b/libelf/elf_end.c
@@ -1,5 +1,6 @@
/* Free resources associated with Elf descriptor.
Copyright (C) 1998,1999,2000,2001,2002,2004,2005,2007,2015,2016 Red Hat, Inc.
+ Copyright (C) 2023 Mark J. Wielaard
This file is part of elfutils.
Written by Ulrich Drepper , 1998.
@@ -32,12 +33,22 @@
#endif
#include
+#include
#include
#include
#include "libelfP.h"
+static void
+free_chunk (void *n)
+{
+ Elf_Data_Chunk *rawchunk = (Elf_Data_Chunk *)n;
+ if (rawchunk->dummy_scn.flag