Re: [PATCH] Fix thread-safety for elfutils

2023-10-14 Thread Heather McIntyre
Hi Mark,

Thank you for taking the time to review the patch and for your thoughtful
feedback. I really appreciate your attention to detail and effort in
splitting the commit into smaller, more manageable parts. John and I went
over some of your comments and concerns yesterday, and I will be working on
addressing them ASAP.

I have a quick query regarding how you'd prefer to handle these changes.
Would you like me to implement some of the recommended modifications and
commit them (if possible), or would you prefer that I simply leave comments
so you can make the necessary adjustments? I noticed that the split-up
branch resides on what appears to be your personal space at
https://code.wildebeest.org/git/user/mjw/elfutils/log/?h=thread-safety. I'm
uncertain whether I have the necessary access for making commits there.

Best regards,
Heather McIntyre

On Sat, Oct 14, 2023 at 10:40 AM Mark Wielaard  wrote:

> Hi Heather,
>
> On Tue, 2023-10-10 at 15:40 +0200, Mark Wielaard wrote:
> > Very nice. That is a lot of work. And I must admit that I cannot hold
> > that much work in my little head at the same time. So I have split up
> > your commit into (what I hope are) logical independent parts. That will
> > make it easier to review. I might have split it into too many parts,
> > but that at least makes it easier to just add those parts that are
> > trivially correct.
> >
> > The only changes I made were:
> > 1. Move the ChangeLog entries into the commit message
> >(This is something we do now and makes cherry picking small
> > changes easier, but I see it isn't actually documented
> > in CONTRIBUTING, sorry. I'll try to update that.)
> > 2. Fixed up some Copyright notices as discussed off-list.
> > 3. Made some whitespace/indentation changes which made the
> >diffs slightly smaller (in most cases).
> >
> > I'll comment/review the individual commits. Which I'll post to the
> > list.
>
> So I commented on each one individually.
> Below is the summary. I have pushed 4 patches to git trunk already.
> I am suggesting to drop 2 changes, but please feel free to say we
> really need them. I have some questions about the rest, but some are
> minor issues. In general I really like the direction of this.
>
> Hope my splitting of your patch isn't too confusing. But it really
> helps me review separate smaller independent parts. And it makes it so
> we can push more of your changes early.
>
> > Heather McIntyre (16):
> >   lib: Add new once_define and once macros to eu-config.h
>
> Looks good. Added to main branch.
>
> >   libelf: Make elf_version thread-safe
>
> Looks good. Added to main branch.
> Question whether this could also have been done with atomics?
>
> >   libelf: Fix deadlock in __libelf_readall
>
> I think the locking here is subtly wrong in the case of
> elf->map_address != NULL
>
> Suggest to look into rewriting libelf_acquire_all and
> libelf_release_all to libelf_acquire_all_children and
> libelf_release_all_children (which would only be called
> with the elf->lock already acquired).
>
> >   libelf: Fix deadlock in elf_cntl
>
> Looks correct, but can probably be simplified by just relying on
> if (__libelf_readall (elf) == NULL)
>
> >   libelf: Fix elf_end deadlock
>
> Looks good. Added to main branch.
>
> >   libelf: Make elf32_getchdr and elf64_getchdr thread-safe
>
> Looks good. Just needs elf32_getchdr.h added to noinst_HEADERS.
> Pushed to the main branch with that change.
>
> >   lib: Add eu_tsearch and eu_tfind
>
> I think this would work. But I really like to discuss whether we can
> have more fine grained locking. The static global lock will likely
> cause too much contention. I added a similar change to noinst_HEADERS
> as above for eu-search.h.
>
> >   libcpu: Change calls for tsearch/tfind to eu_tsearch/eu_tfind.
>
> Suggest to drop this for now. It will need a twalk wrapper. And I am
> not sure it will be easy to make the bison parser concurrent as a
> whole. This is for libasm, not libdw, so can imho be handled separately
> if we really want to make libasm also completely thread-safe.
>
> >   src: Use eu-search in nm and findtextrel.
>
> Although this works, I don't think it is needed because eu-nm and eu-
> findtextrel are single threaded utilities.
>
> >   libdw: make dwarf_getalt thread-safe
>
> I think this should also include dwarf_setalt. Also lets discuss
> whether the locking is done correctly.
>
> >   libdw: Add locking around __libdw_dieabbrev for dwarf_hasattr
>
> I think this is also incomplete. dwarf_child, dwarf_

Re: [PATCH 02/16] libelf: Make elf_version thread-safe

2023-10-17 Thread Heather McIntyre
John and I discussed that atomic_compare_exchange_strong could have been
used here. I see that this has been pushed to the main branch, but I can
make the change to the atomic operation if you think that is a better
option.

Best,
Heather

On Tue, Oct 10, 2023 at 9:00 AM Mark Wielaard  wrote:

> Hi Heather,
>
> On Tue, 2023-10-10 at 15:42 +0200, Mark Wielaard wrote:
> > From: Heather McIntyre 
> >
> >   * elf_version.c (version_once): Define once.
> >   (initialize_version): New static function.
> >   (elf_version): Use initialize_version version_once.
> >
> > Signed-off-by: Heather S. McIntyre 
> > Signed-off-by: Mark Wielaard 
> > ---
> >  libelf/elf_version.c | 11 ++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/libelf/elf_version.c b/libelf/elf_version.c
> > index 6ec534ab..8296bb65 100644
> > --- a/libelf/elf_version.c
> > +++ b/libelf/elf_version.c
> > @@ -32,12 +32,21 @@
> >  #endif
> >
> >  #include 
> > +#include 
> >
> > +/* Multiple threads may initialize __libelf_version.
> > +   pthread_once() ensures that __libelf_version is initialized only
> once. */
> > +once_define(static, version_once);
> >
> >  /* Currently selected version.  Should be EV_CURRENT.
> > Will be EV_NONE if elf_version () has not been called yet.  */
> >  unsigned int __libelf_version = EV_NONE;
> >
> > +static void initialize_version(void)
> > +{
> > +  __libelf_version = EV_CURRENT;
> > +}
> > +
> >  unsigned int
> >  elf_version (unsigned int version)
> >  {
> > @@ -49,7 +58,7 @@ elf_version (unsigned int version)
> >/* Phew, we know this version.  */
> >
> >/* Signal that the version is now initialized.  */
> > -  __libelf_version = EV_CURRENT;
> > +  once(version_once, initialize_version);
> >
> >/* And return the last (or initial) version.  */
> >return EV_CURRENT;
>
> This is an odd function. The intention clearly was to support more "ELF
> versions" at some point. But (luckily) that never happened and I doubt
> there will ever be a different (incompatible) ELF version that we'll
> have to support. So in the end this will always be EV_CURRENT == 1. But
> the function has to be called to make the rest of the library work.
>
> I think this works and is fine. There will most likely never be real
> contention calling elf_version because normally a program just calls it
> once at the start.
>
> But have you thought about using some atomic operation here instead?
>
> Cheers,
>
> Mark
>


Re: [PATCH 03/16] libelf: Fix deadlock in __libelf_readall

2023-10-17 Thread Heather McIntyre
You are right that if elf->map_address != NULL then the acquired wrlock is
not unlocked. The rwlock_unlock that was there initially was removed due to
deadlocking when returning from inside the if statement, but this was not
correct. However, adding ‘else rwlock_unlock (elf->lock)’ at the end of the
if statement fixes this issue.

I rewrote libelf_acquire_all and libelf_release_all as per your suggestion.
Now, libelf_acquire_all_children does not acquire the lock again for the
current elf object, but it does acquire locks for all children. Similarly,
libelf_release_all_children releases the locks for all children under the
acquired elf->lock. In libelf_readall, the elf->lock for the current elf
object is released after the call to libelf_release_all_children before
returning from the function. All tests are still passing after I made these
changes.

I will push the changes after I am done testing other fixes since I want to
ensure that everything works together cohesively.

On Tue, Oct 10, 2023 at 10:06 AM Mark Wielaard  wrote:

> Hi Heather,
>
> On Tue, 2023-10-10 at 15:42 +0200, Mark Wielaard wrote:
> > From: Heather McIntyre 
> >
> >   * libelf/elf_readall.c (__libelf_readall): Move rwlock_unlock
> >   before libelf_acquire_all.
> >
> > Signed-off-by: Heather S. McIntyre 
> > Signed-off-by: Mark Wielaard 
> > ---
> >  libelf/elf_readall.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/libelf/elf_readall.c b/libelf/elf_readall.c
> > index d0f9a28c..2d62d447 100644
> > --- a/libelf/elf_readall.c
> > +++ b/libelf/elf_readall.c
> > @@ -84,6 +84,7 @@ __libelf_readall (Elf *elf)
> >
> >/* If this is an archive and we have derived descriptors get the
> >locks for all of them.  */
> > +  rwlock_unlock(elf->lock); // lock will be reacquired next line
> >libelf_acquire_all (elf);
> >
> >if (elf->maximum_size == ~((size_t) 0))
> > @@ -141,10 +142,8 @@ __libelf_readall (Elf *elf)
> >   __libelf_seterrno (ELF_E_NOMEM);
> >
> >/* Free the locks on the children.  */
> > -  libelf_release_all (elf);
> > +  libelf_release_all (elf); // lock is released
> >  }
> >
> > -  rwlock_unlock (elf->lock);
> > -
> >return (char *) elf->map_address;
> >  }
>
> I think this is wrong when this if statement, at the start of the
> block, fails:
>
>   /* If the file is not mmap'ed and not previously loaded, do it now.  */
>   if (elf->map_address == NULL)
>
> So if elf->map_address != NULL we now never call
> rwlock_unlock (elf->lock).
>
> One way to simplify this locking might be to rewrite libelf_acquire_all
> and libelf_release_all to libelf_acquire_all_children and
> libelf_release_all_children (which would only be called with the elf-
> >lock already acquired).
>
> __libelf_readall is the only caller of these functions.
>
> Cheers,
>
> Mark
>


Re: [PATCH 04/16] libelf: Fix deadlock in elf_cntl

2023-10-17 Thread Heather McIntyre
You are right. I changed the code to just rely on if (__libelf_readall
(elf) == NULL) and this seems to work just fine.

On Tue, Oct 10, 2023 at 10:23 AM Mark Wielaard  wrote:

> Hi Heather,
>
> On Tue, 2023-10-10 at 15:42 +0200, Mark Wielaard wrote:
> > From: Heather McIntyre 
> >
> >   * libelf/elf_cntl.c (elf_cntl): Move rwlock_wrlock, rwlock_unlock,
> >   inside case switch statements.
> >
> > Signed-off-by: Heather S. McIntyre 
> > Signed-off-by: Mark Wielaard 
> > ---
> >  libelf/elf_cntl.c | 11 +++
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/libelf/elf_cntl.c b/libelf/elf_cntl.c
> > index 04aa9132..64087c7d 100644
> > --- a/libelf/elf_cntl.c
> > +++ b/libelf/elf_cntl.c
> > @@ -48,13 +48,16 @@ elf_cntl (Elf *elf, Elf_Cmd cmd)
> >return -1;
> >  }
> >
> > -  rwlock_wrlock (elf->lock);
> > +
> >
> >switch (cmd)
> >  {
> >  case ELF_C_FDREAD:
> > + rwlock_rdlock (elf->lock);
> > + int addr_isnull = elf->map_address == NULL;
> > + rwlock_unlock(elf->lock);
> >/* If not all of the file is in the memory read it now.  */
> > -  if (elf->map_address == NULL && __libelf_readall (elf) == NULL)
> > +  if (addr_isnull && __libelf_readall (elf) == NULL)
> >   {
> > /* We were not able to read everything.  */
> > result = -1;
>
> Can't we just rely on if (__libelf_readall (elf) == NULL)?
>
> __libelf_readall already does locking and will return non-NULL if elf-
> >map_address is already set. So it looks like the extra check (and
> locking) to check addr_isnull is redundant and just make the code more
> complex.
>
> > @@ -64,7 +67,9 @@ elf_cntl (Elf *elf, Elf_Cmd cmd)
> >
> >  case ELF_C_FDDONE:
> >/* Mark the file descriptor as not usable.  */
> > +  rwlock_wrlock (elf->lock);
> >elf->fildes = -1;
> > +  rwlock_unlock (elf->lock);
> >break;
> >
> >  default:
>
> This looks correct. All other accesses to elf->fildes seem to be done
> under the elf->lock too.
>
> > @@ -73,7 +78,5 @@ elf_cntl (Elf *elf, Elf_Cmd cmd)
> >break;
> >  }
> >
> > -  rwlock_unlock (elf->lock);
> > -
> >return result;
> >  }
>
>


Re: [PATCH 09/16] src: Use eu-search in nm and findtextrel.

2023-10-17 Thread Heather McIntyre
Thank you for pointing out that changes to these two files are not needed
due to them being single threaded utilities. I reverted the changes and ran
some tests. Works just fine with the original search.h, tsearch, and tfind.

On Tue, Oct 10, 2023 at 4:26 PM Mark Wielaard  wrote:

> Hi Heather,
>
> On Tue, Oct 10, 2023 at 03:42:53PM +0200, Mark Wielaard wrote:
> > From: Heather McIntyre 
> >
> >   * src/Makefile.am: Add USE_LOCKS condition for -pthread.
> >   * src/findtextrel.c: Add eu-search.h and remove search.h.
> >   Change calls of tsearch/tfind to eu_tsearch/eu_tfind.
> >   * src/nm.c: Likewise.
>
> This does look technically correct. But both nm and findtextrel are
> single threaded programs. It might be interesting to try to make them
> parallel. But currently I think this would just introduce unnecessary
> locking.
>
> Cheers,
>
> Mark
>
> > Signed-off-by: Heather S. McIntyre 
> > Signed-off-by: Mark Wielaard 
> > ---
> >  src/Makefile.am   |  3 +++
> >  src/findtextrel.c | 10 +-
> >  src/nm.c  | 10 +-
> >  3 files changed, 13 insertions(+), 10 deletions(-)
> >
> > diff --git a/src/Makefile.am b/src/Makefile.am
> > index 10d59a48..fea5d43e 100644
> > --- a/src/Makefile.am
> > +++ b/src/Makefile.am
> > @@ -22,6 +22,9 @@ DEFS += $(YYDEBUG) -DDEBUGPRED=@DEBUGPRED@ \
> >  AM_CPPFLAGS += -I$(srcdir)/../libelf -I$(srcdir)/../libebl \
> >   -I$(srcdir)/../libdw -I$(srcdir)/../libdwelf \
> >   -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm
> > +if USE_LOCKS
> > +  AM_CFLAGS += -pthread
> > +endif
> >
> >  AM_LDFLAGS = -Wl,-rpath-link,../libelf:../libdw $(STACK_USAGE_NO_ERROR)
> >
> > diff --git a/src/findtextrel.c b/src/findtextrel.c
> > index d3021a3a..5ac4c29a 100644
> > --- a/src/findtextrel.c
> > +++ b/src/findtextrel.c
> > @@ -27,7 +27,7 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -501,10 +501,10 @@ check_rel (size_t nsegments, struct segments
> segments[nsegments],
> >   /* There can be more than one relocation against one file.
> >  Try to avoid multiple messages.  And yes, the code uses
> >  pointer comparison.  */
> > - if (tfind (src, knownsrcs, ptrcompare) == NULL)
> > + if (eu_tfind (src, knownsrcs, ptrcompare) == NULL)
> > {
> >   printf (_("%s not compiled with -fpic/-fPIC\n"), src);
> > - tsearch (src, knownsrcs, ptrcompare);
> > + eu_tsearch (src, knownsrcs, ptrcompare);
> > }
> >   return;
> > }
> > @@ -555,12 +555,12 @@ check_rel (size_t nsegments, struct segments
> segments[nsegments],
> >   if (sym->st_value + sym->st_size > addr)
> > {
> >   /* It is this function.  */
> > - if (tfind (lowstr, knownsrcs, ptrcompare) == NULL)
> > + if (eu_tfind (lowstr, knownsrcs, ptrcompare) ==
> NULL)
> > {
> >   printf (_("\
> >  the file containing the function '%s' is not compiled with
> -fpic/-fPIC\n"),
> >   lowstr);
> > - tsearch (lowstr, knownsrcs, ptrcompare);
> > + eu_tsearch (lowstr, knownsrcs, ptrcompare);
> > }
> > }
> >   else if (highidx == -1)
> > diff --git a/src/nm.c b/src/nm.c
> > index fbdee8e1..44c20fb2 100644
> > --- a/src/nm.c
> > +++ b/src/nm.c
> > @@ -32,7 +32,7 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -537,7 +537,7 @@ static int
> >  get_global (Dwarf *dbg __attribute__ ((unused)), Dwarf_Global *global,
> >   void *arg __attribute__ ((unused)))
> >  {
> > -  tsearch (memcpy (xmalloc (sizeof (Dwarf_Global)), global,
> > +  eu_tsearch (memcpy (xmalloc (sizeof (Dwarf_Global)), global,
> >  sizeof (Dwarf_Global)),
> >  &global_root, global_compare);
> >
> > @@ -696,7 +696,7 @@ get_local_names (Dwarf *dbg)
> >  /* Check whether a similar local_name is already in the
> > cache.  That should not happen.  But if it does, we
> >

Re: [PATCH 10/16] libdw: make dwarf_getalt thread-safe

2023-10-17 Thread Heather McIntyre
Since it wasn't too complicated to do, I implemented a dwarf object lock
(per struct dwarf lock) instead of having the static global lock. As per
your suggestion, I placed the lock over the whole dwarf_getalt function, so
now find_debug_altlink is called with the lock already acquired. In
addition, I placed locking around dwarf_setalt. I am currently in the
process of testing these changes along with others to make sure everything
is ironed out.

On Tue, Oct 10, 2023 at 5:04 PM Mark Wielaard  wrote:

> Hi Heather,
>
> On Tue, Oct 10, 2023 at 03:42:54PM +0200, Mark Wielaard wrote:
> >   * libdw/dwarf_getalt.c: Add lock for
> >   dbg->alt_dwarf/main->alt_dwarf.
>
> This takes care of dwarf_getalt. Shouldn't dwarf_setalt also use
> locking?
>
> > Signed-off-by: Heather S. McIntyre 
> > Signed-off-by: Mark Wielaard 
> > ---
> >  libdw/dwarf_getalt.c | 27 ++-
> >  1 file changed, 22 insertions(+), 5 deletions(-)
> >
> > diff --git a/libdw/dwarf_getalt.c b/libdw/dwarf_getalt.c
> > index 0a12dfae..e3894c8c 100644
> > --- a/libdw/dwarf_getalt.c
> > +++ b/libdw/dwarf_getalt.c
> > @@ -44,6 +44,10 @@
> >  #include 
> >  #include 
> >
> > +/* find_debug_altlink() modifies "dbg->alt_dwarf".
> > +   dwarf_getalt() reads "main->alt_dwarf".
> > +   Mutual exclusion is enforced to prevent a race. */
> > +rwlock_define(static, alt_dwarf_lock);
>
> Probably overkill, but should we consider a Dwarf lock object instead
> of having a static global lock?
>
> >  char *
> >  internal_function
> > @@ -152,7 +156,9 @@ find_debug_altlink (Dwarf *dbg)
> >Dwarf *alt = dwarf_begin (fd, O_RDONLY);
> >if (alt != NULL)
> >   {
> > +   rwlock_wrlock(alt_dwarf_lock);
> > dbg->alt_dwarf = alt;
> > +   rwlock_unlock(alt_dwarf_lock);
> > dbg->alt_fd = fd;
> >   }
> >else
>
> Is this lock wide enough? See also below. It looks like multiple
> threads could arrive at this point at the same time. so alt_dwarf and
> alt_fd can be (re)set multiple times, causing leaks of the Dwarf and
> fd.
>
> > @@ -163,22 +169,33 @@ find_debug_altlink (Dwarf *dbg)
> >  Dwarf *
> >  dwarf_getalt (Dwarf *main)
> >  {
> > +  rwlock_rdlock(alt_dwarf_lock);
> > +  Dwarf *alt_dwarf_local = main->alt_dwarf;
> > +  rwlock_unlock(alt_dwarf_lock);
> > +
> >/* Only try once.  */
> > -  if (main == NULL || main->alt_dwarf == (void *) -1)
> > +  if (main == NULL || alt_dwarf_local == (void *) -1)
> >  return NULL;
> >
> > -  if (main->alt_dwarf != NULL)
> > -return main->alt_dwarf;
> > +  if (alt_dwarf_local != NULL)
> > +return alt_dwarf_local;
> >
>
> So at this point it looks like we can have multiple threads running
> (because the lock has been dropped above) all with alt_dwarf_local
> (and main->alt_dwarf) being NULL.
>
> >find_debug_altlink (main);
> >
> > +  rwlock_rdlock(alt_dwarf_lock);
> > +  alt_dwarf_local = main->alt_dwarf;
> > +  rwlock_unlock(alt_dwarf_lock);
> > +
> >/* If we found nothing, make sure we don't try again.  */
> > -  if (main->alt_dwarf == NULL)
> > +  if (alt_dwarf_local == NULL)
> >  {
> > +  rwlock_wrlock(alt_dwarf_lock);
> >main->alt_dwarf = (void *) -1;
> > +  rwlock_unlock(alt_dwarf_lock);
> > +
> >return NULL;
> >  }
> >
> > -  return main->alt_dwarf;
> > +  return alt_dwarf_local;
> >  }
> >  INTDEF (dwarf_getalt)
> > --
> > 2.41.0
> >
>
> It might be better to take the lock over the whole function (and only
> call find_debug_altlink with the lock held).
>
> Cheers,
>
> Mark
>


Re: [PATCH 11/16] libdw: Add locking around __libdw_dieabbrev for dwarf_hasattr

2023-10-17 Thread Heather McIntyre
Hi Mark,

I see now that this is incomplete considering the other places that also
call this function. I do agree that global locking may be heavy if 1)
implemented in all of these locations, or 2) implemented directly in
__libdw_dieabbrev. We could use atomics here directly in __libdw_dieabbrev.
I have given this a try and it is currently passing all tests, including
the new ones I added for data race detection.

I know you mentioned that taking any pthread lock at all might be a big
overhead, but since I implemented a per dwarf struct lock, would using that
be a possibility? Assuming multiple calls to __libdw_dieabbrev will be
working on different dwarf objects.

On Tue, Oct 10, 2023 at 8:44 AM Mark Wielaard  wrote:

> Hi Heather,
>
> On Tue, 2023-10-10 at 15:42 +0200, Mark Wielaard wrote:
>  From: Heather McIntyre 
>
>  * libdw/dwarf_hasattr.c (dwarf_hasattr): Use die_abbrev_lock
>  around __libdw_dieabbrev call.
>
>  Signed-off-by: Heather S. McIntyre 
>  Signed-off-by: Mark Wielaard 
>  ---
>   libdw/dwarf_hasattr.c | 9 +
>   1 file changed, 9 insertions(+)
>
>  diff --git a/libdw/dwarf_hasattr.c b/libdw/dwarf_hasattr.c
>  index eca08394..92f8de68 100644
>  --- a/libdw/dwarf_hasattr.c
>  +++ b/libdw/dwarf_hasattr.c
>  @@ -34,6 +34,10 @@
>   #include 
>   #include "libdwP.h"
>
>  +/* dwarf_hasattr() calls __libdw_dieabbrev() in libdwP.h.
>  +   __libdw_dieabbrev() reads/writes "die->abbrev".
>  +   Mutual exclusion is enforced around the call to __libdw_dieabbrev to
> prevent a race. */
>  +rwlock_define(static, die_abbrev_lock);
>
> dwarf_child, dwarf_getattrs, dwarf_haschildren and dwarf_tag also use
> __libdw_dieabbrev to get the Dwarf_Abbrev pointer for the given
> Dwarf_DIE. Shouldn't they also use such locking? Or have the locking
> inside __libdw_dieabbrev itself?
>
> Also there are many Dwarf_Dies which all start out "lazy" without
> abbrev set. So taking a global static lock, or even taking any pthread
> lock at all might be a big overhead. Is there some way we can do this
> with atomics instead?
>
>
>   int
>   dwarf_hasattr (Dwarf_Die *die, unsigned int search_name)
>  @@ -41,8 +45,13 @@ dwarf_hasattr (Dwarf_Die *die, unsigned int
> search_name)
> if (die == NULL)
>   return 0;
>
>  +  rwlock_wrlock(die_abbrev_lock);
>  +
> /* Find the abbreviation entry.  */
> Dwarf_Abbrev *abbrevp = __libdw_dieabbrev (die, NULL);
>  +
>  +  rwlock_unlock(die_abbrev_lock);
>  +
> if (unlikely (abbrevp == DWARF_END_ABBREV))
>   {
> __libdw_seterrno (DWARF_E_INVALID_DWARF);
>


Re: [PATCH 12/16] libdw: Make libdw_find_split_unit thread-safe

2023-10-17 Thread Heather McIntyre
Hi Mark,

As per your suggestion, I have placed the lock around the entire
__libdw_find_split_unit function call.

On Wed, Oct 11, 2023 at 12:17 PM Mark Wielaard  wrote:

> Hi Heather,
>
> On Tue, Oct 10, 2023 at 03:42:56PM +0200, Mark Wielaard wrote:
> > From: Heather McIntyre 
> >
> >   * (try_split_file): Use eu_tsearch.
> >   Add lock for cu->split.
> >
> > Signed-off-by: Heather S. McIntyre 
> > Signed-off-by: Mark Wielaard 
> > ---
> >  libdw/libdw_find_split_unit.c | 43 +--
> >  1 file changed, 36 insertions(+), 7 deletions(-)
> >
> > diff --git a/libdw/libdw_find_split_unit.c
> b/libdw/libdw_find_split_unit.c
> > index 533f8072..a288e9bd 100644
> > --- a/libdw/libdw_find_split_unit.c
> > +++ b/libdw/libdw_find_split_unit.c
> > @@ -34,13 +34,19 @@
> >  #include "libelfP.h"
> >
> >  #include 
> > -#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >
> > +/* __libdw_link_skel_split() modifies "skel->split = split"
> > +   and "split->split = skel".
> > +   "cu->split" is read at multiple locations and conditionally updated.
> > +   Mutual exclusion is enforced to prevent a race. */
> > +rwlock_define(static, cu_split_lock);
>
> __libdw_link_skel_split is defined in libdw/libdwP.h and is called in
> try_split_file. (It is also called in src/readelf.c but that is a
> terrible hack, so ignore that for now.)
>
> A Dwarf_CU is created (see libdw/libdw_findcu.c) with split set to
> (Dwarf_CU *) -1 to indicate the split dwarf is not yet set. If
> cu->split is NULL it means the split dwarf was searched for, but not
> found.
>
> >  static void
> >  try_split_file (Dwarf_CU *cu, const char *dwo_path)
> >  {
> > @@ -57,7 +63,7 @@ try_split_file (Dwarf_CU *cu, const char *dwo_path)
> > if (split->unit_type == DW_UT_split_compile
> > && cu->unit_id8 == split->unit_id8)
> >   {
> > -   if (tsearch (split->dbg, &cu->dbg->split_tree,
> > +   if (eu_tsearch (split->dbg, &cu->dbg->split_tree,
> >  __libdw_finddbg_cb) == NULL)
> >   {
> > /* Something went wrong.  Don't link.  */
> > @@ -65,9 +71,13 @@ try_split_file (Dwarf_CU *cu, const char *dwo_path)
> > break;
> >   }
> >
> > +   rwlock_wrlock(cu_split_lock);
> > +
> > /* Link skeleton and split compile units.  */
> > __libdw_link_skel_split (cu, split);
> >
> > +   rwlock_unlock(cu_split_lock);
> > +
>
> Is this locking wide enough? It seems like multiple threads can race
> to this code and set different Dwarfs (which means they'll leak). See
> below.
>
> > /* We have everything we need from this ELF
> >file.  And we are going to close the fd to
> >not run out of file descriptors.  */
> > @@ -75,8 +85,13 @@ try_split_file (Dwarf_CU *cu, const char *dwo_path)
> > break;
> >   }
> >   }
> > +
> > +   rwlock_rdlock(cu_split_lock);
> > +
> > if (cu->split == (Dwarf_CU *) -1)
> >   dwarf_end (split_dwarf);
> > +
> > +   rwlock_unlock(cu_split_lock);
>
> This means __libdw_link_skel_split wasn't called above (so the
> split_dwarf created by dwarf_begin didn't contain the expected CU).
>
> >   }
> >/* Always close, because we don't want to run out of file
> >descriptors.  See also the elf_fcntl ELF_C_FDDONE call
> > @@ -89,9 +104,13 @@ Dwarf_CU *
> >  internal_function
> >  __libdw_find_split_unit (Dwarf_CU *cu)
> >  {
> > +  rwlock_rdlock(cu_split_lock);
> > +  Dwarf_CU *cu_split_local = cu->split;
> > +  rwlock_unlock(cu_split_lock);
> > +
> >/* Only try once.  */
> > -  if (cu->split != (Dwarf_CU *) -1)
> > -return cu->split;
> > +  if (cu_split_local != (Dwarf_CU *) -1)
> > +return cu_split_local;
> >
> >/* We need a skeleton unit with a comp_dir and [GNU_]dwo_name
> attributes.
> >   The split unit will be the first in the dwo file and should have
> the
> > @@ -116,7 +135,11 @@ __libdw_find_split_unit (Dwarf_CU *cu)
> > free (dwo_path);
> >   }

Re: [PATCH 13/16] libdw: Make libdw_findcu thread-safe

2023-10-17 Thread Heather McIntyre
Hi Mark,

For the call to __libdw_intern_next_unit in __libdw_findcu, I have updated
the locking to the per struct drawf lock. Although I have not encountered a
data race report regarding the call to __libdw_intern_next_unit in
dwarf_formref_die, I have also placed the updated locking there as well.

On Thu, Oct 12, 2023 at 5:03 PM Mark Wielaard  wrote:

> Hi Heather,
>
> On Tue, Oct 10, 2023 at 03:42:57PM +0200, Mark Wielaard wrote:
> > From: Heather McIntyre 
> >
> >   * libdw/libdw_findcu.c (findcu_cb): Use eu_tsearch.
> >   (__libdw_findcu): Use eu_tfind and next_tucu_offset_lock.
> >   (__libdw_findcu_addr): Use eu_tfind.
> >   (__libdw_find_split_dbg_addr): Likewise.
> >
> > Signed-off-by: Heather S. McIntyre 
> > Signed-off-by: Mark Wielaard 
> > ---
> >  libdw/libdw_findcu.c | 54 
> >  1 file changed, 35 insertions(+), 19 deletions(-)
> >
> > diff --git a/libdw/libdw_findcu.c b/libdw/libdw_findcu.c
> > index ed744231..e546fb0f 100644
> > --- a/libdw/libdw_findcu.c
> > +++ b/libdw/libdw_findcu.c
> > @@ -32,9 +32,13 @@
> >  #endif
> >
> >  #include 
> > -#include 
> > +#include 
> >  #include "libdwP.h"
> >
> > +/* __libdw_findcu modifies "&dbg->next_tu_offset :
> &dbg->next_cu_offset".
> > +   May read or write, so mutual exclusion is enforced to prevent a
> race. */
> > +rwlock_define(static, next_tucu_offset_lock);
> > +
>
> Could this be a per struct Dwarf lock?
>
> >  static int
> >  findcu_cb (const void *arg1, const void *arg2)
> >  {
> > @@ -213,7 +217,7 @@ __libdw_intern_next_unit (Dwarf *dbg, bool
> debug_types)
> >
> >  Dwarf_Sig8_Hash_insert (&dbg->sig8_hash, unit_id8, newp);
> >
> >/* Add the new entry to the search tree.  */
> > -  if (tsearch (newp, tree, findcu_cb) == NULL)
> > +  if (eu_tsearch (newp, tree, findcu_cb) == NULL)
> >  {
> >/* Something went wrong.  Undo the operation.  */
> >*offsetp = oldoff;
>
> This is OK since when __libdw_intern_next_unit is called from
> __libdw_findcu the next_tucu_offset_lock is held.
>
> But there is also a call to __libdw_intern_next_unit from
> dwarf_formref_die. So there should also be a lock there?
>
> > @@ -234,28 +238,40 @@ __libdw_findcu (Dwarf *dbg, Dwarf_Off start, bool
> v4_debug_types)
> >
> >/* Maybe we already know that CU.  */
> >struct Dwarf_CU fake = { .start = start, .end = 0 };
> > -  struct Dwarf_CU **found = tfind (&fake, tree, findcu_cb);
> > +  struct Dwarf_CU **found = eu_tfind (&fake, tree, findcu_cb);
> > +  struct Dwarf_CU *result = NULL;
> > +
> >if (found != NULL)
> >  return *found;
> >
> > -  if (start < *next_offset)
> > -{
> > -  __libdw_seterrno (DWARF_E_INVALID_DWARF);
> > -  return NULL;
> > -}
> > +  rwlock_wrlock(next_tucu_offset_lock);
> >
> > -  /* No.  Then read more CUs.  */
> > -  while (1)
> > +  if (start < *next_offset)
> > +__libdw_seterrno (DWARF_E_INVALID_DWARF);
> > +  else
> >  {
> > -  struct Dwarf_CU *newp = __libdw_intern_next_unit (dbg,
> v4_debug_types);
> > -  if (newp == NULL)
> > - return NULL;
> > +  /* No.  Then read more CUs.  */
> > +  while (1)
> > + {
> > +   struct Dwarf_CU *newp = __libdw_intern_next_unit (dbg,
> > +
>  v4_debug_types);
> > +   if (newp == NULL)
> > + {
> > +   result = NULL;
> > +   break;
> > + }
> >
> > -  /* Is this the one we are looking for?  */
> > -  if (start < *next_offset || start == newp->start)
> > - return newp;
> > +   /* Is this the one we are looking for?  */
> > +   if (start < *next_offset || start == newp->start)
> > + {
> > +   result = newp;
> > +   break;
> > + }
> > + }
> >  }
> > -  /* NOTREACHED */
> > +
> > +  rwlock_unlock(next_tucu_offset_lock);
> > +  return result;
> >  }
>
> This look OK.
>
> >  struct Dwarf_CU *
> > @@ -283,7 +299,7 @@ __libdw_findcu_addr (Dwarf *dbg, void *addr)
> >  return NULL;
> >
> >struct Dwarf_CU fake = { .start = start, .end = 0 };
> > -  struct Dwarf_CU **found = tfind (&fake, tree, findcu_cb);
> > +  struct Dwarf_CU **found = eu_tfind (&fake, tree, findcu_cb);
> >
> >if (found != NULL)
> >  return *found;
> > @@ -298,7 +314,7 @@ __libdw_find_split_dbg_addr (Dwarf *dbg, void *addr)
> >/* XXX Assumes split DWARF only has CUs in main IDX_debug_info.  */
> >Elf_Data fake_data = { .d_buf = addr, .d_size = 0 };
> >Dwarf fake = { .sectiondata[IDX_debug_info] = &fake_data };
> > -  Dwarf **found = tfind (&fake, &dbg->split_tree, __libdw_finddbg_cb);
> > +  Dwarf **found = eu_tfind (&fake, &dbg->split_tree,
> __libdw_finddbg_cb);
> >
> >if (found != NULL)
> >  return *found;
>
> OK.
>
> Thanks,
>
> Mark
>


Re: [PATCH 07/16] lib: Add eu_tsearch and eu_tfind

2023-10-17 Thread Heather McIntyre
Hi Mark,

I think having unique per-root locks might be a good idea. From a brief
test, I saw around 70 roots created when parsing the test file I have been
using. I have an initial idea for this that I don't think will be too
complicated. I will go over my idea with John to get his input and then get
back to you.

Best,
Heather

On Tue, Oct 10, 2023 at 11:51 AM Mark Wielaard  wrote:

> Hi Heather,
>
> On Tue, 2023-10-10 at 15:42 +0200, Mark Wielaard wrote:
> > From: Heather McIntyre 
> >
> >   * lib/eu-search.h: New file.
> >   Declarations for read/write locked eu_tsearch/eu_tfind.
>
> Like in the previous case, don't forget to add such a new header to
> noinst_HEADERS so make dist adds them.
>
> diff --git a/lib/Makefile.am b/lib/Makefile.am
> index ce8f3e1b..9df0a8c6 100644
> --- a/lib/Makefile.am
> +++ b/lib/Makefile.am
> @@ -39,5 +39,6 @@ libeu_a_SOURCES = xasprintf.c xstrdup.c xstrndup.c
> xmalloc.c next_prime.c \
>
>  noinst_HEADERS = fixedsizehash.h libeu.h system.h dynamicsizehash.h
> list.h \
>  eu-config.h color.h printversion.h bpf.h \
> -atomics.h stdatomic-fbsd.h dynamicsizehash_concurrent.h
> +atomics.h stdatomic-fbsd.h dynamicsizehash_concurrent.h \
> +eu-search.h
>  EXTRA_DIST = dynamicsizehash.c dynamicsizehash_concurrent.c
>
> >   * lib/eu-search.c: New file.
> >   Definitions for read/write locked eu_tsearch/eu_tfind.
> >   * Makefile.am (libeu_a_SOURCES): Add eu-search.c.
> >
> > Signed-off-by: Heather S. McIntyre 
> > Signed-off-by: Mark Wielaard 
> > ---
> >  lib/Makefile.am |  2 +-
> >  lib/eu-search.c | 60 +
> >  lib/eu-search.h | 39 
> >  3 files changed, 100 insertions(+), 1 deletion(-)
> >  create mode 100644 lib/eu-search.c
> >  create mode 100644 lib/eu-search.h
> >
> > diff --git a/lib/Makefile.am b/lib/Makefile.am
> > index b3bb929f..ce8f3e1b 100644
> > --- a/lib/Makefile.am
> > +++ b/lib/Makefile.am
> > @@ -34,7 +34,7 @@ AM_CPPFLAGS += -I$(srcdir)/../libelf
> >  noinst_LIBRARIES = libeu.a
> >
> >  libeu_a_SOURCES = xasprintf.c xstrdup.c xstrndup.c xmalloc.c
> next_prime.c \
> > -   crc32.c crc32_file.c \
> > +   crc32.c crc32_file.c eu-search.c \
> > color.c error.c printversion.c
>
> OK.
>
> >  noinst_HEADERS = fixedsizehash.h libeu.h system.h dynamicsizehash.h
> list.h \
> > diff --git a/lib/eu-search.c b/lib/eu-search.c
> > new file mode 100644
> > index ..a6b04f4f
> > --- /dev/null
> > +++ b/lib/eu-search.c
> > @@ -0,0 +1,60 @@
> > +/* Definitions for thread-safe tsearch/tfind
> > +   Copyright (C) 2023 Rice University
> > +   This file is part of elfutils.
> > +
> > +   This file is free software; you can redistribute it and/or modify
> > +   it under the terms of either
> > +
> > + * the GNU Lesser General Public License as published by the Free
> > +   Software Foundation; either version 3 of the License, or (at
> > +   your option) any later version
> > +
> > +   or
> > +
> > + * the GNU General Public License as published by the Free
> > +   Software Foundation; either version 2 of the License, or (at
> > +   your option) any later version
> > +
> > +   or both in parallel, as here.
> > +
> > +   elfutils is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   You should have received copies of the GNU General Public License and
> > +   the GNU Lesser General Public License along with this program.  If
> > +   not, see <
> https://urldefense.com/v3/__http://www.gnu.org/licenses/__;!!BuQPrrmRaQ!j_izX_1hkv80LnClL5l2GD-1nTSH3dTTKjXsHtkYvr7TyudmXWLEZ7zOOkpQZFgo_0QKiWAkkvj7SBE54w$
> >.  */
> > +
> > +#ifdef HAVE_CONFIG_H
> > +#include 
> > +#endif
> > +
> > +#include 
> > +#include "eu-search.h"
> > +
> > +rwlock_define(static, search_find_lock);
> > +
> > +void *eu_tsearch(const void *key, void **rootp,
> > +  int (*compar)(const void *, const void *))
> > +{
> > +  void *ret = NULL;
> > +  rwlock_wrlock(search_find_lock);
> > +
> > +  ret = tsearch(key, rootp, compar);
> > +
> > +  rwlock_unlock(search_find_lock);
> > +  r

Deadlock from --enable-thread-safety

2023-08-04 Thread Heather McIntyre via Elfutils-devel
I've been making --enable-thread-safety (USE_LOCKS) more viable by fixing
deadlocks throughout the libelf library. This has required minimal code
changes so far. However, I've hit a snag where "__elf64_updatenull_wrlock"
calls "elf64_getchdr," which leads to "elf64_getshdr." This sequence
attempts to acquire a read lock under a write lock and fails. Similarly,
"elf64_getchdr" calls "elf_getdata," which also tries and fails to
implement a read lock.

Unfortunately, fixing this particular deadlock requires more than minimal
code changes. From what I understand, I may have a few options on how to
fix this:

1) Create copies of the getchdr and elf_getdata functions, renaming them
getchdr_wrlock and elf_getdata_wrlock. However, multiple copies of these
functions could bloat the code, which may not be ideal.
2) Some type of write lock flag to indicate when a write lock is active.
Either within the locking macro in eu-config.h or passed as an argument in
the functions.

Ultimately, I thought others may want to look into this to see if there
might be options for a better solution. Here is the backtrace:

Program received signal SIGABRT, Aborted.
0x77837aff in raise () from /lib64/libc.so.6
#0  0x77837aff in raise () from /lib64/libc.so.6
#1  0x7780aea5 in abort () from /lib64/libc.so.6
#2  0x7780ad79 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x778304c9 in __assert_perror_fail () from /lib64/libc.so.6
#4  0x77fda554 in elf64_getshdr (scn=0x40fc20) at
../../libelf/elf32_getshdr.c:292
#5  0x77fe9590 in elf64_getchdr (scn=0x40fc20) at
../../libelf/elf32_getchdr.c:45
#6  0x77fdf690 in __elf64_updatenull_wrlock (elf=0x40d880,
change_bop=0x7fffac60, shnum=30) at ../../libelf/elf32_updatenull.c:407
#7  0x77fdd83d in elf_update (elf=0x40d880, cmd=ELF_C_WRITE) at
../../libelf/elf_update.c:211
#8  0x00405d9e in process_file (fname=0x7fffbb96
"testfile-s390x-hash-both") at ../../src/elfcompress.c:1336
#9  0x00406232 in main (argc=7, argv=0x7fffb768) at
../../src/elfcompress.c:1458


[PATCH] Fix thread-safety for elfutils

2023-08-08 Thread Heather McIntyre via Elfutils-devel
Hello all,

This patch was created to address thread-safety issues reported in bug 26921
<https://sourceware.org/bugzilla/show_bug.cgi?id=26921> and bug 26930
<https://sourceware.org/bugzilla/show_bug.cgi?id=26930>.
Additionally, other thread-safety fixes were applied during the process.

Brief Description:
Locking was implemented for tsearch and tfind.
Changes to multiple files within the libelf library were made such that
USE_LOCKS no longer causes tests to fail when enabled.
New tests were added to confirm thread-safety.

Please review the attached patch file and feel free to provide feedback. I
am available for any clarifications or modifications needed.

Best regards,
Heather McIntyre


0001-Fix-thread-safety-for-elfutils.patch
Description: Binary data