Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.

2023-06-21 Thread Mark Wielaard
Hi Romain,

On Tue, 2023-06-20 at 22:05 +, Romain GEISSLER wrote:
> 
> Our real use case happens on a Openshift 4.13 node, so the OS is Red Hat Core 
> OS 9 (which I assume shares a lot of foundations with RHEL 9).
> 
> On our side Francois also told me this afternoon that he didn’t really 
> reproduce the same thing with my reproducer posted here and the real 
> systemd-coredump issue he witnessed live, and also noticed that with 
> DEBUGINFOD_URLS unset/set to the empty string my reproducer has no problem 
> anymore. What he witnessed on the real case (using perf/gdb) was that 
> apparently lots of time was spend in elf_getdata_rawchunk and often in this 
> kind of stack:
> 
> Samples: 65K of event 'cpu-clock:pppH', Event count (approx.): 1646850
>   
>
> Overhead  Command Shared Object Symbol
>   
>
>   98.24%  (sd-parse-elf)  libelf-0.188.so   [.] elf_getdata_rawchunk
>    0.48%  (sd-parse-elf)  libelf-0.188.so   [.] 0x48a3
>    0.27%  (sd-parse-elf)  libelf-0.188.so   [.] gelf_getphdr
>    0.11%  (sd-parse-elf)  libc.so.6 [.] _int_malloc
>    0.10%  (sd-parse-elf)  libelf-0.188.so   [.] gelf_getnote
>    0.06%  (sd-parse-elf)  libc.so.6 [.] __libc_calloc
>    0.05%  (sd-parse-elf)  [kernel.kallsyms] [k] 
> __softirqentry_text_start
>    0.05%  (sd-parse-elf)  libc.so.6 [.] _int_free
> 
> 
> (gdb) bt
> #0  0x7f0ba8a88194 in elf_getdata_rawchunk () from 
> target:/lib64/libelf.so.1
> #1  0x7f0ba98e5013 in module_callback.lto_priv () from 
> target:/usr/lib64/systemd/libsystemd-shared-252.so
> #2  0x7f0ba8ae7291 in dwfl_getmodules () from target:/lib64/libdw.so.1
> #3  0x7f0ba98e6dc0 in parse_elf_object () from 
> target:/usr/lib64/systemd/libsystemd-shared-252.so
> #4  0x562c474f2d5e in submit_coredump ()
> #5  0x562c474f57d1 in process_socket.constprop ()
> #6  0x562c474efbf8 in main ()
> 
> My reproducer actually doesn’t fully re-implement what systemd implements 
> (the parsing of the package metadata is clearly omitted), so I thought I had 
> reproduced the same problem while apparently I didn’t, sorry for that. We 
> will also have to double check if really just using 2000 dummy libraries is 
> enough or if this also needs to have a more complex binary like we have in 
> our real case.
> 
> Tomorrow on our side we will have to play a bit with a local build of 
> systemd-coredump and try to run it manually to better understand what’s going 
> wrong.
> 

Seeing those performance results I understand why you were suspecting
the linked list data structure used in elf_getdata_rawchunk.

Would you be able to test a rebuild libelf with the attached patch,
which replaces that datastructure with a binary search tree?

It didn't really show much speedup locally (in the noise, maybe 0.01
sec faster on ~0.25 sec run). But if there are more than 2000 calls to
elf_getdata_rawchunk it should make things faster.

Thanks,

Mark
From 3aca5b5f1f1617db2220022d9061dcaf129e54c4 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Wed, 21 Jun 2023 18:05:12 +0200
Subject: [PATCH] libelf: Replace list of elf_getdata_rawchunk results with a
 tree

elf_getdata_rawchunks did a linear search to see if a chunk was
already fetched. Replace this list with a binary search tree to make
lookup faster when a lot of Elf_Data_Chunk were created.

   * libelf/libelfP.h (Elf_Data_Chunk): Remove next field.
   (struct Elf): Change the rawchunks type from Elf_Data_Chunk *
   to void *.
   * elf_getdata_rawchunk.c (chunk_compare): New static function.
   (elf_getdata_rawchunk): Use tsearch instead of a manual linked
   list.
   * elf_end.c (free_chunk): New static function.
   (elf_end): Call tdestroy instead of walking linked list.

Signed-off-by: Mark Wielaard 
---
 libelf/elf_end.c  | 22 +---
 libelf/elf_getdata_rawchunk.c | 47 +--
 libelf/libelfP.h  | 13 --
 3 files changed, 52 insertions(+), 30 deletions(-)

diff --git a/libelf/elf_end.c b/libelf/elf_end.c
index 5c451f36..3e5d4c86 100644
--- a/libelf/elf_end.c
+++ b/libelf/elf_end.c
@@ -1,5 +1,6 @@
 /* Free resources associated with Elf descriptor.
Copyright (C) 1998,1999,2000,2001,2002,2004,2005,2007,2015,2016 Red Hat, Inc.
+   Copyright (C) 2023 Mark J. Wielaard 
This file is part of elfutils.
Written by Ulrich Drepper , 1998.
 
@@ -32,12 +33,22 @@
 #endif
 
 #include 
+#include 
 #include 
 #include 
 
 #include "libelfP.h"
 
 
+static void
+free_chunk (void *n)
+{
+  Elf_Data_Chunk *rawchunk = (Elf_Data_Chunk *)n;
+  if (rawchunk->dummy_scn.flag

Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.

2023-06-21 Thread Romain GEISSLER via Elfutils-devel
> Le 21 juin 2023 à 18:24, Mark Wielaard  a écrit :
> 
> 
> 
> Hi Romain,
> 
> Seeing those performance results I understand why you were suspecting
> the linked list data structure used in elf_getdata_rawchunk.
> 
> Would you be able to test a rebuild libelf with the attached patch,
> which replaces that datastructure with a binary search tree?
> 
> It didn't really show much speedup locally (in the noise, maybe 0.01
> sec faster on ~0.25 sec run). But if there are more than 2000 calls to
> elf_getdata_rawchunk it should make things faster.

Hi Mark,

Actually I have spent some time today to reproduce the issue for real, using
directly systemd. The details can be found in
https://github.com/systemd/systemd/pull/28093 which was just merged. This
change in systemd is enough to fix the biggest part of the "slow" parsing of
cores. Yet even with my systemd patch, we will still call elf_getdata_rawchunk
2000 times if the crashing process had 2000 shared libraries, so indeed your
patch in elfutils might still be welcome. I will test it later when I go home, 
or
tomorrow.

Thanks,
Romain

Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.

2023-06-21 Thread Mark Wielaard
Hi Romain,

On Wed, Jun 21, 2023 at 06:14:27PM +, Romain GEISSLER wrote:
> > Le 21 juin 2023 à 18:24, Mark Wielaard  a écrit :
> > 
> > Seeing those performance results I understand why you were suspecting
> > the linked list data structure used in elf_getdata_rawchunk.
> > 
> > Would you be able to test a rebuild libelf with the attached patch,
> > which replaces that datastructure with a binary search tree?
> > 
> > It didn't really show much speedup locally (in the noise, maybe 0.01
> > sec faster on ~0.25 sec run). But if there are more than 2000 calls to
> > elf_getdata_rawchunk it should make things faster.
> 
> Actually I have spent some time today to reproduce the issue for
> real, using directly systemd. The details can be found in
> https://github.com/systemd/systemd/pull/28093 which was just
> merged. This change in systemd is enough to fix the biggest part of
> the "slow" parsing of cores. Yet even with my systemd patch, we will
> still call elf_getdata_rawchunk 2000 times if the crashing process
> had 2000 shared libraries, so indeed your patch in elfutils might
> still be welcome. I will test it later when I go home, or tomorrow.

That patch looks good. It should reduce the number of calls to
elf_getdata_rawchunk a lot. Making it less urgent that function is
fast. But if you could test it that would be appreciated. I'll also
test it a bit more and will probably merge it if no issues show up
since it does seem better than keep using the linear list.

Thanks,

Mark


Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.

2023-06-21 Thread Frank Ch. Eigler via Elfutils-devel
Hi -

> On our side Francois also told me this afternoon that he didn’t
> really reproduce the same thing with my reproducer posted here and
> the real systemd-coredump issue he witnessed live, and also noticed
> that with DEBUGINFOD_URLS unset/set to the empty string my
> reproducer has no problem anymore. [...]

Just doing the math from the debuginfod-client point of view (ignoring
other the later systemd side fix that moots this):

For an application that processes these elf/dwarf files sequentially,
queries for each synthetic solib are going to result in 2000 https-404
transactions, sans debuginfod caching.  If you're lucky (reusing a
dwfl object), elfutils may be able to reuse a long-lived https
connection to a server, otherwise a new https connection might have to
be spun up for each.  But even with reuse, we're talking about 2000
pingponging messages.  That will take a handful of minutes of elapsed
time just by itself.

If the calling code made these queries in parallel batches, it would
be much faster overall.

- FChE