Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

2019-10-28 Thread Mark Wielaard
On Sat, 2019-10-26 at 19:56 -0500, Jonathon Anderson wrote:
> On Sun, Oct 27, 2019 at 00:50, Mark Wielaard  wrote:
> > 
> > I see that getconf PTHREAD_KEYS_MAX gives 1024 on my machine.
> > Is this tunable in any way?
> 
>  From what I can tell, no. A quick google search indicates as such,
> and its even #defined as 1024 on my machine.

I see, it is a hardcoded constant per architecture, but it seems every
architecture simply uses 1024. I am afraid that kind of rules out
having a pthread_key per Dwarf object. It is not that large a number.

Programs are sometimes linked against 50 till 100 shared libraries, if
they use dwz/alt-files that means the potential open Dwarf objects is
several hundred already. It wouldn't be that crazy to have all of them
open at the same time. That might not reach the limit yet, but I think
in practice you could come close to half very easily. And with split-
dwarf every CU basically turns into a Dwarf object, which can easily go
past 1024.

> >  There may be other ways to handle this, I'm high-level considering
> > >  potential alternatives (with more atomics, of course). The 
> > > difficulty
> > >  is mostly in providing the same performance in the single-threaded 
> > > case.
> > > 
> > >  > You already have a Dwarf *.  I would consider adding some sort of
> > >  > clone function which creates a shallow Dwarf * with its own 
> > > embedded
> > >  > allocator or something like that.
> > > 
> > >  The downside with this is that its an API addition, which we (the
> > >  Dyninst + HPCToolkit projects) would need to enforce. Which isn't a
> > >  huge deal for us, but I will need to make a case to those teams to 
> > > make
> > >  the shift.
> > > 
> > >  On the upside, it does provide a very understandable semantic in the
> > >  case of parallelism. For an API without synchronization clauses, 
> > > this
> > >  would put our work back into the realm of "reasonably correct" (from
> > >  "technically incorrect but works.")
> > 
> > Could someone give an example of this pattern?
> > I don't fully understand what is being proposed and how the interface
> > would look exactly.
> 
> An application would do something along these lines:
> 
> Dwarf* dbg = dwarf_begin(...);
> Dwarf* dbg2 = dwarf_clone(dbg, ...);
> pthread_create(worker, ...);
> // ...
> dwarf_get_units(dbg, ...);
> // ...
> pthread_join(worker);
> dwarf_end(dbg);
> 
> // worker:
> // ...
> dwarf_getabbrev(...);
> // ...
> dwarf_end(dbg2);
> 
> The idea being that dbg2 and dbg share most of the same internal state, 
> but concurrent access to said state is between Dwarfs (or 
> "Dwarf_Views", maybe?), and the state is cleaned up on the original's 
> dwarf_end. I suppose in database terms the Dwarfs are acting like 
> separate "cursors" for the internal DWARF data. For this particular 
> instance, the "top of stack" pointers would be in dbg and dbg2 (the 
> non-shared state), while the atomic mem_tail would be part of the 
> internal (shared) state.
> 
> I'm not sure how viable implementing this sort of thing would be, it 
> might end up overhauling a lot of internals, and I'm not familiar 
> enough with all the components of the API to know whether there would 
> be some quirks with this style.

So they would have separate lazy DWARF DIE/abbrev readers and separate
allocators? And any abbrevs read in the clone would just be thrown away
after a dwarf_end?

In general I think this style is not as nice as having a shared state
of the Dwarf object that is only manipulated in a thread-safe manner.
You did have an earlier implementation that didn't use pthread_keys.
Maybe we should just fall back to that one?

Thanks,

Mark


Re: [PATCH 3/3] lib + libdw: Add and use a concurrent version of the dynamic-size hash table.

2019-10-28 Thread Mark Wielaard
On Sun, 2019-10-27 at 12:49 -0500, Jonathon Anderson wrote:
> In theory (if the system isn't overloaded) the threads should finish 
> their individual work at around the same time, so the amount of waiting 
> any one thread would do (should be) very short. Also, this is only once 
> per resize, which (shouldn't) happen very often anyway.
> 
> The other busy loops (in insert_helper) are also read-only with a 
> single concurrent store, so they (should) spin in cache until the 
> invalidation. Again, (should) be a short wait.
> 
> If it starts to be an issue we can add some exponential backoff, but I 
> haven't noticed any issues in general (and, a nanosleep itself is a 
> spin for suitably short durations, or so I've been told).

Yeah, it might even be easier to use a bit large initial (prime) size
for the hash to avoid the whole situation if it does look like it is a
problem in practice. I don't really expect it to. Just curious if there
was a different model. But sleeping might not be good either because it
might make it harder for the thread to start doing real work again if
it has to wake up the core it is running up again.


> Depends on your definition of significant, anything less than 10MB 
> would fall entirely below my radar, and the biggest thing I've tried 
> has 935 CUs. We also tend to close our Dwarfs regularly, I think the 
> maximum open we've used so far is one per thread.

Right, and that is a fairly large Dwarf which would take just 100K
extra memory. So normally it far less extra overhead.

> That said, I can also envision cases (say, a gdb on elfutils on a small 
> system) that might be prohibited (or at least annoyed) by this. I can 
> lower the overhead to 64 bytes easily (moving bits to the master 
> thread's stack), but that's the limit with this design.

Although I would like to see that if you have time, I am not that
concerned. And I think the pthread_key limit issue is more important to
resolve.

Thanks,

Mark


Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

2019-10-28 Thread Jonathon Anderson




On Mon, Oct 28, 2019 at 14:26, Mark Wielaard  wrote:

On Sat, 2019-10-26 at 19:56 -0500, Jonathon Anderson wrote:
 On Sun, Oct 27, 2019 at 00:50, Mark Wielaard > wrote:

 >
 > I see that getconf PTHREAD_KEYS_MAX gives 1024 on my machine.
 > Is this tunable in any way?

  From what I can tell, no. A quick google search indicates as such,
 and its even #defined as 1024 on my machine.


I see, it is a hardcoded constant per architecture, but it seems every
architecture simply uses 1024. I am afraid that kind of rules out
having a pthread_key per Dwarf object. It is not that large a number.

Programs are sometimes linked against 50 till 100 shared libraries, if
they use dwz/alt-files that means the potential open Dwarf objects is
several hundred already. It wouldn't be that crazy to have all of them
open at the same time. That might not reach the limit yet, but I think
in practice you could come close to half very easily. And with split-
dwarf every CU basically turns into a Dwarf object, which can easily 
go

past 1024.

 >  There may be other ways to handle this, I'm high-level 
considering

 > >  potential alternatives (with more atomics, of course). The
 > > difficulty
 > >  is mostly in providing the same performance in the 
single-threaded

 > > case.
 > >
 > >  > You already have a Dwarf *.  I would consider adding some 
sort of

 > >  > clone function which creates a shallow Dwarf * with its own
 > > embedded
 > >  > allocator or something like that.
 > >
 > >  The downside with this is that its an API addition, which we 
(the
 > >  Dyninst + HPCToolkit projects) would need to enforce. Which 
isn't a
 > >  huge deal for us, but I will need to make a case to those 
teams to

 > > make
 > >  the shift.
 > >
 > >  On the upside, it does provide a very understandable semantic 
in the
 > >  case of parallelism. For an API without synchronization 
clauses,

 > > this
 > >  would put our work back into the realm of "reasonably correct" 
(from

 > >  "technically incorrect but works.")
 >
 > Could someone give an example of this pattern?
 > I don't fully understand what is being proposed and how the 
interface

 > would look exactly.

 An application would do something along these lines:

 Dwarf* dbg = dwarf_begin(...);
 Dwarf* dbg2 = dwarf_clone(dbg, ...);
 pthread_create(worker, ...);
 // ...
 dwarf_get_units(dbg, ...);
 // ...
 pthread_join(worker);
 dwarf_end(dbg);

 // worker:
 // ...
 dwarf_getabbrev(...);
 // ...
 dwarf_end(dbg2);

 The idea being that dbg2 and dbg share most of the same internal 
state,

 but concurrent access to said state is between Dwarfs (or
 "Dwarf_Views", maybe?), and the state is cleaned up on the 
original's

 dwarf_end. I suppose in database terms the Dwarfs are acting like
 separate "cursors" for the internal DWARF data. For this particular
 instance, the "top of stack" pointers would be in dbg and dbg2 (the
 non-shared state), while the atomic mem_tail would be part of the
 internal (shared) state.

 I'm not sure how viable implementing this sort of thing would be, it
 might end up overhauling a lot of internals, and I'm not familiar
 enough with all the components of the API to know whether there 
would

 be some quirks with this style.


So they would have separate lazy DWARF DIE/abbrev readers and separate
allocators? And any abbrevs read in the clone would just be thrown 
away

after a dwarf_end?


Separate allocators but the same lazy DIE/abbrev readers, most likely. 
So a Dwarf would be split into the concurrent Dwarf_Shared and 
non-concurrent Dwarf "View", maybe something like:


struct Dwarf_Shared
{
 pthread_rwlock_t rwl; /* For all currently non-concurrent internals */
 Elf *elf;
 char *debugdir;
 Dwarf *alt_dwarf;
 Elf_Data *sectiondata[IDX_last];
 bool other_byte_order;
 bool free_elf;
 int alt_fd;
 struct pubnames_s
 {
   Dwarf_Off cu_offset;
   Dwarf_Off set_start;
   unsigned int cu_header_size;
   int address_len;
 } *pubnames_sets;
 size_t pubnames_nsets;
 void *cu_tree;
 /* Dwarf_Off next_cu_offset; // Moved to Dwarf View */
 void *tu_tree;
 /* Dwarf_Off next_tu_offset; // Moved to Dwarf View */
 Dwarf_Sig8_Hash sig8_hash;
 void *split_tree;
 void *macro_ops;
 void *files_lines;
 Dwarf_Aranges *aranges;
 struct Dwarf_CFI_s *cfi;
 struct Dwarf_CU *fake_loc_cu;
 struct Dwarf_CU *fake_loclists_cu;
 struct Dwarf_CU *fake_addr_cu;
 /* pthread_key_t mem_key; // Implemented differently */
 atomic_uintptr_t shared_mem_tail;
 size_t mem_default_size;
 /* Dwarf_OOM oom_handler; // Moved to Dwarf View, maybe */
};
struct Dwarf /*View*/ {
 bool free_shared; /* If true, we handle cleaning up on dwarf_end */
 struct Dwarf_Shared *shared; /* All cloned() Views share the same 
Shared in the back */

 Dwarf_Off next_cu_offset;
 Dwarf_Off next_tu_offset;
 struct libdw_memblock *mem_tail;
 Dwarf_OOM oom_handler;
};

So most everything is in Dwarf_Shared, and the bits that really only 
make sense when done in serial are part of the View. And then 
a

patch 0/2 debuginfod submission

2019-10-28 Thread Frank Ch. Eigler
Hi -

Aaron Merey and I would like to propose debuginfod for merge into
elfutils mainline, after a couple of months of work.  As a reminder,
this is an http server (written in C++11) for debuginfo-related
artifacts (elf/dwarf/source files), plus a C client library & command
line tool.  It comes with man pages and tests.  

I'll followup with two git format-patch emails, one for the client,
and one for the server+tests+etc.  The identical code is also in the
git://sourceware.org/git/elfutils.git debuginfod-submit branch.

- FChE



patch 1/2 debuginfod client

2019-10-28 Thread Frank Ch. Eigler
From 3e1f8d93569004d06902459d84baceb636e139d5 Mon Sep 17 00:00:00 2001
From: Aaron Merey 
Date: Mon, 28 Oct 2019 13:29:26 -0400
Subject: [PATCH 1/2] debuginfod 1/2: client side

Introduce the debuginfod/ subdirectory, containing the client for a
new debuginfo-over-http service, in shared-library and command-line
forms.  Two functions in libdwfl make calls into the client library to
fetch elf/dwarf files by buildid, as a fallback.  Instead of normal
dynamic linking (thus pulling in a variety of curl dependencies),
the libdwfl hooks use dlopen/dlsym.  Server & tests coming in patch 2.

Signed-off-by: Aaron Merey 
Signed-off-by: Frank Ch. Eigler 
---
 ChangeLog   |   6 +
 Makefile.am |   4 +
 config/Makefile.am  |   2 +-
 config/libdebuginfod.pc.in  |  12 +
 configure.ac|  23 +-
 debuginfod/ChangeLog|   9 +
 debuginfod/Makefile.am  | 116 +
 debuginfod/debuginfod-client.c  | 624 
 debuginfod/debuginfod-find.1| 131 +
 debuginfod/debuginfod-find.c| 108 
 debuginfod/debuginfod.h |  69 +++
 debuginfod/debuginfod_find_debuginfo.3  | 173 +++
 debuginfod/debuginfod_find_executable.3 |   1 +
 debuginfod/debuginfod_find_source.3 |   1 +
 debuginfod/libdebuginfod.map|   7 +
 libdw/ChangeLog |   4 +
 libdw/Makefile.am   |   2 +-
 libdwfl/ChangeLog   |   6 +
 libdwfl/Makefile.am |   3 +-
 libdwfl/dwfl_build_id_find_elf.c|  30 +-
 libdwfl/find-debuginfo.c|  31 +-
 m4/ChangeLog|   4 +
 m4/Makefile.am  |   2 +-
 m4/ax_check_compile_flag.m4 |  74 +++
 m4/ax_cxx_compile_stdcxx.m4 | 556 +
 25 files changed, 1989 insertions(+), 9 deletions(-)
 create mode 100644 config/libdebuginfod.pc.in
 create mode 100644 debuginfod/ChangeLog
 create mode 100644 debuginfod/Makefile.am
 create mode 100644 debuginfod/debuginfod-client.c
 create mode 100644 debuginfod/debuginfod-find.1
 create mode 100644 debuginfod/debuginfod-find.c
 create mode 100644 debuginfod/debuginfod.h
 create mode 100644 debuginfod/debuginfod_find_debuginfo.3
 create mode 100644 debuginfod/debuginfod_find_executable.3
 create mode 100644 debuginfod/debuginfod_find_source.3
 create mode 100644 debuginfod/libdebuginfod.map
 create mode 100644 m4/ax_check_compile_flag.m4
 create mode 100644 m4/ax_cxx_compile_stdcxx.m4

diff --git a/ChangeLog b/ChangeLog
index 911cf35432c9..4f33657df976 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2019-10-28  Aaron Merey  
+
+   * debuginfod/: New directory for debuginfod code.
+   * Makefile.am (SUBDIRS): Recurse there.
+   * configure.ac (--enable-debuginfod): New flag & checks.
+
 2019-07-05  Omar Sandoval  
 
* configure.ac: Get rid of --enable-libebl-subdir.
diff --git a/Makefile.am b/Makefile.am
index 52f64fc904f4..bd8926b52344 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -29,6 +29,10 @@ pkginclude_HEADERS = version.h
 SUBDIRS = config m4 lib libelf libcpu backends libebl libdwelf libdwfl libdw \
  libasm src po doc tests
 
+if DEBUGINFOD
+SUBDIRS += debuginfod
+endif
+
 EXTRA_DIST = elfutils.spec GPG-KEY NOTES CONTRIBUTING \
 COPYING COPYING-GPLV2 COPYING-LGPLV3
 
diff --git a/config/Makefile.am b/config/Makefile.am
index 9d292cee66c8..6425718efc54 100644
--- a/config/Makefile.am
+++ b/config/Makefile.am
@@ -32,7 +32,7 @@ EXTRA_DIST = elfutils.spec.in known-dwarf.awk 
10-default-yama-scope.conf
 libelf.pc.in libdw.pc.in
 
 pkgconfigdir = $(libdir)/pkgconfig
-pkgconfig_DATA = libelf.pc libdw.pc
+pkgconfig_DATA = libelf.pc libdw.pc libdebuginfod.pc
 
 if MAINTAINER_MODE
 $(srcdir)/elfutils.spec.in: $(top_srcdir)/NEWS
diff --git a/config/libdebuginfod.pc.in b/config/libdebuginfod.pc.in
new file mode 100644
index ..46722a76b593
--- /dev/null
+++ b/config/libdebuginfod.pc.in
@@ -0,0 +1,12 @@
+prefix=@prefix@
+exec_prefix=@exec_prefix@
+libdir=@libdir@
+includedir=@includedir@
+
+Name: debuginfod
+Description: elfutils library to query debuginfo files from debuginfod servers
+Version: @VERSION@
+URL: http://elfutils.org/
+
+Libs: -L${libdir} -ldebuginfod
+Cflags: -I${includedir}
diff --git a/configure.ac b/configure.ac
index 9be34d124387..738a96f18bf5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -60,6 +60,8 @@ AC_CONFIG_FILES([m4/Makefile])
 dnl The RPM spec file.  We substitute a few values in the file.
 AC_CONFIG_FILES([elfutils.spec:config/elfutils.spec.in])
 
+dnl debuginfo-server client & server parts.
+AC_CONFIG_FILES([debuginfod/Makefile])
 
 AC_CANONICAL_HOST
 
@@ -86,6 +88,8 @@ AS_IF([test "$use_locks" = yes],
 AH_TEMPLATE([USE_LOCKS], [Defined if libraries should be thread-safe.])
 
 AC_PROG_CC
+AC_PR

Re: [PATCH 3/3] lib + libdw: Add and use a concurrent version of the dynamic-size hash table.

2019-10-28 Thread Mark Wielaard
On Sun, 2019-10-27 at 17:13 +0100, Mark Wielaard wrote:
> On Fri, 2019-10-25 at 23:11 -0500, Jonathon Anderson wrote:
> > A quick grep -r shows that ITERATE and REVERSE are used for the 
> > asm_symbol_tab. If iteration and insertion are not concurrent we can 
> > easily add bidirectional iteration (using the same Treiber stack-like 
> > structure as used for the memory management). Also COMPARE is not 
> > defined to be (0) in this instance.
> 
> Yes. We would need some other solution for the libasm code. But I think
> we can use the concurrent table for everything in libdw.

And everything in libdw besides the abbrev hash is the sig8 hash.
And adopting the concurrent dynamic size hash for that is almost
trivial. See attached.

I only tested it lightly because I don't have any large projects build
with -fdebug-types-dection. But it seems to work as intended.

Cheers,

Mark
From a4bf12a8b906911172897f14d62b3f43cc4066ad Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Mon, 28 Oct 2019 20:54:35 +0100
Subject: [PATCH] libdw: Use dynamicsizehash_concurrent for Dwarf_Sig8_Hash.

Dwarf_Sig8_Hash is used as part of a Dwarf object to quickly find a type
unit based on the type signature. Use the concurrent variant of the dynamic
size hash to make it thread-safe to use.

Signed-off-by: Mark Wielaard 
---
 libdw/ChangeLog   | 8 
 libdw/dwarf_formref_die.c | 2 +-
 libdw/dwarf_sig8_hash.c   | 2 +-
 libdw/dwarf_sig8_hash.h   | 9 +++--
 4 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/libdw/ChangeLog b/libdw/ChangeLog
index 28563b4d..61231e5c 100644
--- a/libdw/ChangeLog
+++ b/libdw/ChangeLog
@@ -1,3 +1,11 @@
+2019-10-28  Mark Wielaard  
+
+	* dwarf_sig8_hash.h: Include libdw.h. Remove COMPARE. Include
+	dynamicsizehash_concurrent.h.
+	* dwarf_sig8_hash.c: Include dynamicsizehash_concurrent.c.
+	* dwarf_formref_die.c (dwarf_formref_die): Drop NULL argument to
+	Dwarf_Sig8_Hash_find.
+
 2019-08-26  Srđan Milaković  
 
 	* dwarf_abbrev_hash.{c,h}: Use the *_concurrent hash table.
diff --git a/libdw/dwarf_formref_die.c b/libdw/dwarf_formref_die.c
index f196331a..48ba8194 100644
--- a/libdw/dwarf_formref_die.c
+++ b/libdw/dwarf_formref_die.c
@@ -83,7 +83,7 @@ dwarf_formref_die (Dwarf_Attribute *attr, Dwarf_Die *result)
 	 have to match in the type unit headers.  */
 
   uint64_t sig = read_8ubyte_unaligned (cu->dbg, attr->valp);
-  cu = Dwarf_Sig8_Hash_find (&cu->dbg->sig8_hash, sig, NULL);
+  cu = Dwarf_Sig8_Hash_find (&cu->dbg->sig8_hash, sig);
   if (cu == NULL)
 	{
 	  /* Not seen before.  We have to scan through the type units.
diff --git a/libdw/dwarf_sig8_hash.c b/libdw/dwarf_sig8_hash.c
index 043cac78..777f9ebc 100644
--- a/libdw/dwarf_sig8_hash.c
+++ b/libdw/dwarf_sig8_hash.c
@@ -38,4 +38,4 @@
 #define next_prime __libdwarf_next_prime
 extern size_t next_prime (size_t) attribute_hidden;
 
-#include 
+#include 
diff --git a/libdw/dwarf_sig8_hash.h b/libdw/dwarf_sig8_hash.h
index 705ffbcd..c399919a 100644
--- a/libdw/dwarf_sig8_hash.h
+++ b/libdw/dwarf_sig8_hash.h
@@ -29,10 +29,15 @@
 #ifndef _DWARF_SIG8_HASH_H
 #define _DWARF_SIG8_HASH_H	1
 
+#ifdef HAVE_CONFIG_H
+# include 
+#endif
+
+#include "libdw.h"
+
 #define NAME Dwarf_Sig8_Hash
 #define TYPE struct Dwarf_CU *
-#define COMPARE(a, b) (0)
 
-#include 
+#include 
 
 #endif	/* dwarf_sig8_hash.h */
-- 
2.18.1