[Bug libelf/25069] AddressSanitizer: heap-buffer-overflow at libdwelf/dwelf_strtab.c:284

2019-10-26 Thread leftcopy.chx at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=25069

--- Comment #2 from leftcopy.chx at gmail dot com ---
(In reply to Mark Wielaard from comment #1)
> I am unable to replicate this. Are you able to replicate with current git
> trunk (with the recent fixes for eu-unstrip)?

I cannot replicate with git trunk either.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug libelf/25069] AddressSanitizer: heap-buffer-overflow at libdwelf/dwelf_strtab.c:284

2019-10-26 Thread leftcopy.chx at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=25069

--- Comment #3 from leftcopy.chx at gmail dot com ---
Created attachment 12053
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12053&action=edit
poc that triggers the crash against git 99dc63b1

Found another poc that triggers this crash. See attachment.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

2019-10-26 Thread Florian Weimer
* Jonathon Anderson:

> Just finished some modifications to the patch series, git request-pull 
> output below. This rebases onto the latest master and does a little 
> diff cleaning, the major change is that I swapped out the memory 
> management to use the pthread_key_* alternative mentioned before.

This use of pthread_key_* is rather unusual.  In particular, there are
only 1024 keys supported for the entire process, so it limits the
number of Dwarf * objects that can be created by one process, even if
it does not use any threads.


Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

2019-10-26 Thread Mark Wielaard
On Sat, 2019-10-26 at 12:54 +0200, Florian Weimer wrote:
> * Jonathon Anderson:
> 
> > Just finished some modifications to the patch series, git request-pull 
> > output below. This rebases onto the latest master and does a little 
> > diff cleaning, the major change is that I swapped out the memory 
> > management to use the pthread_key_* alternative mentioned before.
> 
> This use of pthread_key_* is rather unusual.  In particular, there are
> only 1024 keys supported for the entire process, so it limits the
> number of Dwarf * objects that can be created by one process, even if
> it does not use any threads.

O, I didn't know that there was such a low limit on pthread_keys. That
is indeed a bit of a problem, especially with split-dwarf where
basically every CU could be its own Dwarf object (from a .dwo file).
That would easily exceed such a low limit.

I'll see if I can create a case where that is a problem. Then we can
see how to adjust things to use less pthread_keys. Is there a different
pattern we can use?

Thanks,

Mark


Re: Buildbot failure in Wildebeest Builder on whole buildset

2019-10-26 Thread Mark Wielaard
On Sat, 2019-10-26 at 02:47 +, build...@builder.wildebeest.org
wrote:
> The Buildbot has detected a failed build on builder whole buildset
> while building elfutils.
> Full details are available at:
> https://builder.wildebeest.org/buildbot/#builders/10/builds/376
> 

It looks like the s390x buildbot worker was just overloaded and timed
out during a compile job. The next build did succeed and everything
looks green.

Cheers,

Mark


Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

2019-10-26 Thread Florian Weimer
* Mark Wielaard:

> I'll see if I can create a case where that is a problem. Then we can
> see how to adjust things to use less pthread_keys. Is there a different
> pattern we can use?

It's unclear what purpose thread-local storage serves in this context.
You already have a Dwarf *.  I would consider adding some sort of
clone function which creates a shallow Dwarf * with its own embedded
allocator or something like that.  This assumes that memory allocation
is actually a performance problem, otherwise you could let malloc
handle the details.


Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

2019-10-26 Thread Jonathon Anderson

Hello Florian Weimer,

I'm the original author of this patch, so I'll try to answer what I can.

For some overall perspective, this patch replaces the original libdw 
allocator with a thread-safe variant. The original acts both as a 
suballocator (to keep from paying the malloc tax on frequent small 
allocations) and a garbage collection list (to free internal structures 
on dwarf_end). The patch attempts to replicate the same overall 
behavior in the more volatile parallel case.


On Sat, Oct 26, 2019 at 18:14, Florian Weimer  wrote:

* Mark Wielaard:


 I'll see if I can create a case where that is a problem. Then we can
 see how to adjust things to use less pthread_keys. Is there a 
different

 pattern we can use?


It's unclear what purpose thread-local storage serves in this context.


The thread-local storage provides the suballocator side: for each 
Dwarf, each thread has its own "top block" to perform allocations from. 
To make this simple, each Dwarf has a key to give threads local storage 
specific to that Dwarf. Or at least that was the intent, I didn't think 
to consider the limit, we didn't run into it in our use cases.


There may be other ways to handle this, I'm high-level considering 
potential alternatives (with more atomics, of course). The difficulty 
is mostly in providing the same performance in the single-threaded case.



You already have a Dwarf *.  I would consider adding some sort of
clone function which creates a shallow Dwarf * with its own embedded
allocator or something like that.


The downside with this is that its an API addition, which we (the 
Dyninst + HPCToolkit projects) would need to enforce. Which isn't a 
huge deal for us, but I will need to make a case to those teams to make 
the shift.


On the upside, it does provide a very understandable semantic in the 
case of parallelism. For an API without synchronization clauses, this 
would put our work back into the realm of "reasonably correct" (from 
"technically incorrect but works.")



This assumes that memory allocation
is actually a performance problem, otherwise you could let malloc
handle the details.


In our (Dyninst + HPCToolkit) work, we have found that malloc tends to 
be slow in the multithreaded case, in particular with many small 
allocations. The glibc implementation (which most of our clients use) 
uses a full mutex to provide thread-safety. While we could do a lot 
better in our own projects with regards to memory management, the fact 
remains that malloc alone is a notable facet to the performance of 
libdw.


Hopefully this helps give a little light on the issue.

-Jonathon














Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

2019-10-26 Thread Florian Weimer
* Jonathon Anderson:

>> This assumes that memory allocation
>> is actually a performance problem, otherwise you could let malloc
>> handle the details.
>
> In our (Dyninst + HPCToolkit) work, we have found that malloc tends to 
> be slow in the multithreaded case, in particular with many small 
> allocations. The glibc implementation (which most of our clients use) 
> uses a full mutex to provide thread-safety. While we could do a lot 
> better in our own projects with regards to memory management, the fact 
> remains that malloc alone is a notable facet to the performance of 
> libdw.

Current glibc versions have a thread-local fast path, which should
address some of these concerns.  It's still not a bump-pointer
allocator, but at least there are no atomics on that path.

I'm not sure if it is prudent to code around imperfections in older
allocators.


[Bug libelf/25069] AddressSanitizer: heap-buffer-overflow at libdwelf/dwelf_strtab.c:284

2019-10-26 Thread mark at klomp dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=25069

Mark Wielaard  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-10-26
 Ever confirmed|0   |1

--- Comment #4 from Mark Wielaard  ---
Thanks, replicated with that reproducer (and the stripped file from the first)
with valgrind:

mark@librem:~/build/elfutils-obj$ LD_LIBRARY_PATH=backends:libelf:libdw:libasm
valgrind -q src/unstrip poc2/hbo__dwelf_strtab.c\:284_2 poc2/stripped -o
/dev/null
==22429== Invalid read of size 1
==22429==at 0x4838C74: strlen (vg_replace_strmem.c:460)
==22429==by 0x49B3EE3: dwelf_strtab_add (dwelf_strtab.c:284)
==22429==by 0x11A8BE: copy_elided_sections (unstrip.c:1882)
==22429==by 0x11CDA2: handle_file (unstrip.c:2203)
==22429==by 0x11D10D: handle_explicit_files (unstrip.c:2268)
==22429==by 0x1121CF: main (unstrip.c:2603)
==22429==  Address 0x5d8c3b5 is 0 bytes after a block of size 3,829 alloc'd
==22429==at 0x483577F: malloc (vg_replace_malloc.c:299)
==22429==by 0x4878E2D: __libelf_set_rawdata_wrlock (elf_getdata.c:332)
==22429==by 0x4879B5C: __elf_getdata_rdlock (elf_getdata.c:535)
==22429==by 0x4879B5C: __elf_getdata_rdlock (elf_getdata.c:458)
==22429==by 0x4879C73: elf_getdata (elf_getdata.c:562)
==22429==by 0x112B29: collect_symbols (unstrip.c:843)
==22429==by 0x119B7D: copy_elided_sections (unstrip.c:1820)
==22429==by 0x11CDA2: handle_file (unstrip.c:2203)
==22429==by 0x11D10D: handle_explicit_files (unstrip.c:2268)
==22429==by 0x1121CF: main (unstrip.c:2603)
==22429== 
==22429== Invalid read of size 1
==22429==at 0x483D4E0: mempcpy (vg_replace_strmem.c:1536)
==22429==by 0x49B3BD2: mempcpy (string_fortified.h:48)
==22429==by 0x49B3BD2: copystrings (dwelf_strtab.c:301)
==22429==by 0x49B400C: dwelf_strtab_finalize (dwelf_strtab.c:342)
==22429==by 0x11AC4C: copy_elided_sections (unstrip.c:1929)
==22429==by 0x11CDA2: handle_file (unstrip.c:2203)
==22429==by 0x11D10D: handle_explicit_files (unstrip.c:2268)
==22429==by 0x1121CF: main (unstrip.c:2603)
==22429==  Address 0x5d8c3b5 is 0 bytes after a block of size 3,829 alloc'd
==22429==at 0x483577F: malloc (vg_replace_malloc.c:299)
==22429==by 0x4878E2D: __libelf_set_rawdata_wrlock (elf_getdata.c:332)
==22429==by 0x4879B5C: __elf_getdata_rdlock (elf_getdata.c:535)
==22429==by 0x4879B5C: __elf_getdata_rdlock (elf_getdata.c:458)
==22429==by 0x4879C73: elf_getdata (elf_getdata.c:562)
==22429==by 0x112B29: collect_symbols (unstrip.c:843)
==22429==by 0x119B7D: copy_elided_sections (unstrip.c:1820)
==22429==by 0x11CDA2: handle_file (unstrip.c:2203)
==22429==by 0x11D10D: handle_explicit_files (unstrip.c:2268)
==22429==by 0x1121CF: main (unstrip.c:2603)
==22429== 
src/unstrip: cannot write output file: section `sh_size' too small for data

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug tools/25069] AddressSanitizer: heap-buffer-overflow at libdwelf/dwelf_strtab.c:284

2019-10-26 Thread mark at klomp dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=25069

Mark Wielaard  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
  Component|libelf  |tools
   Assignee|unassigned at sourceware dot org   |mark at klomp dot org

--- Comment #5 from Mark Wielaard  ---
The problem is that the symbol table string data (.strtab) is corrupt. The last
string doesn't have a zero terminator. This can be fixed by checking the symbol
name is a valid string:

diff --git a/src/unstrip.c b/src/unstrip.c
index f4314d5d..9b8c09a1 100644
--- a/src/unstrip.c
+++ b/src/unstrip.c
@@ -854,7 +854,9 @@ collect_symbols (Elf *outelf, bool rel, Elf_Scn *symscn,
Elf_Scn *strscn,
   if (sym->st_shndx != SHN_XINDEX)
shndx = sym->st_shndx;

-  if (sym->st_name >= strdata->d_size)
+  if (sym->st_name >= strdata->d_size
+ || memrchr (strdata->d_buf + sym->st_name, '\0',
+ strdata->d_size - sym->st_name) == NULL)
error (EXIT_FAILURE, 0,
   _("invalid string offset in symbol [%zu]"), i);

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

2019-10-26 Thread Mark Wielaard
Hi,

On Sat, 2019-10-26 at 11:45 -0500, Jonathon Anderson wrote:
> For some overall perspective, this patch replaces the original libdw 
> allocator with a thread-safe variant. The original acts both as a 
> suballocator (to keep from paying the malloc tax on frequent small 
> allocations) and a garbage collection list (to free internal structures 
> on dwarf_end). The patch attempts to replicate the same overall 
> behavior in the more volatile parallel case.

That is a nice description. Basically it is a little obstack
implementation. There are a lot of small allocations which we want to
store together and free together when the Dwarf object is destroyed. 

The allocations (and parsing of DWARF structures) is done lazily. So
you only pay when you are actually using the data. e.g. if you skip a
DIE (subtree) or CU no parsing or allocations are done.

For example when parsing all of the linux kernel debug data we are
talking about ~535000 allocations, a bit less than half (~233000) are
of the same small size, 24bytes.

> On Sat, Oct 26, 2019 at 18:14, Florian Weimer  wrote:
> > * Mark Wielaard:
> > 
> > >  I'll see if I can create a case where that is a problem. Then we can
> > >  see how to adjust things to use less pthread_keys. Is there a 
> > > different
> > >  pattern we can use?
> > 
> > It's unclear what purpose thread-local storage serves in this context.
> 
> The thread-local storage provides the suballocator side: for each 
> Dwarf, each thread has its own "top block" to perform allocations from. 
> To make this simple, each Dwarf has a key to give threads local storage 
> specific to that Dwarf. Or at least that was the intent, I didn't think 
> to consider the limit, we didn't run into it in our use cases.

I see that getconf PTHREAD_KEYS_MAX gives 1024 on my machine.
Is this tunable in any way?

> There may be other ways to handle this, I'm high-level considering 
> potential alternatives (with more atomics, of course). The difficulty 
> is mostly in providing the same performance in the single-threaded case.
> 
> > You already have a Dwarf *.  I would consider adding some sort of
> > clone function which creates a shallow Dwarf * with its own embedded
> > allocator or something like that.
> 
> The downside with this is that its an API addition, which we (the 
> Dyninst + HPCToolkit projects) would need to enforce. Which isn't a 
> huge deal for us, but I will need to make a case to those teams to make 
> the shift.
> 
> On the upside, it does provide a very understandable semantic in the 
> case of parallelism. For an API without synchronization clauses, this 
> would put our work back into the realm of "reasonably correct" (from 
> "technically incorrect but works.")

Could someone give an example of this pattern?
I don't fully understand what is being proposed and how the interface
would look exactly.

Thanks,

Mark


Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

2019-10-26 Thread Mark Wielaard
Hi,

On Sat, 2019-10-26 at 18:50 +0200, Florian Weimer wrote:
> * Jonathon Anderson:
> 
> > > This assumes that memory allocation
> > > is actually a performance problem, otherwise you could let malloc
> > > handle the details.
> > 
> > In our (Dyninst + HPCToolkit) work, we have found that malloc tends to 
> > be slow in the multithreaded case, in particular with many small 
> > allocations. The glibc implementation (which most of our clients use) 
> > uses a full mutex to provide thread-safety. While we could do a lot 
> > better in our own projects with regards to memory management, the fact 
> > remains that malloc alone is a notable facet to the performance of 
> > libdw.
> 
> Current glibc versions have a thread-local fast path, which should
> address some of these concerns.  It's still not a bump-pointer
> allocator, but at least there are no atomics on that path.

Since which version of glibc is there a thread-local fast path?

We don't need a bump-pointer allocator, but we do need something to
store lots of small object allocated over time together, so we can
easily dispose of them when done. The storage size only has to grow.

Thanks,

Mark


Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

2019-10-26 Thread Jonathon Anderson




On Sun, Oct 27, 2019 at 00:50, Mark Wielaard  wrote:

Hi,

On Sat, 2019-10-26 at 11:45 -0500, Jonathon Anderson wrote:

 For some overall perspective, this patch replaces the original libdw
 allocator with a thread-safe variant. The original acts both as a
 suballocator (to keep from paying the malloc tax on frequent small
 allocations) and a garbage collection list (to free internal 
structures

 on dwarf_end). The patch attempts to replicate the same overall
 behavior in the more volatile parallel case.


That is a nice description. Basically it is a little obstack
implementation. There are a lot of small allocations which we want to
store together and free together when the Dwarf object is destroyed.

The allocations (and parsing of DWARF structures) is done lazily. So
you only pay when you are actually using the data. e.g. if you skip a
DIE (subtree) or CU no parsing or allocations are done.

For example when parsing all of the linux kernel debug data we are
talking about ~535000 allocations, a bit less than half (~233000) are
of the same small size, 24bytes.

 On Sat, Oct 26, 2019 at 18:14, Florian Weimer > wrote:

 > * Mark Wielaard:
 >
 > >  I'll see if I can create a case where that is a problem. Then 
we can

 > >  see how to adjust things to use less pthread_keys. Is there a
 > > different
 > >  pattern we can use?
 >
 > It's unclear what purpose thread-local storage serves in this 
context.


 The thread-local storage provides the suballocator side: for each
 Dwarf, each thread has its own "top block" to perform allocations 
from.
 To make this simple, each Dwarf has a key to give threads local 
storage
 specific to that Dwarf. Or at least that was the intent, I didn't 
think

 to consider the limit, we didn't run into it in our use cases.


I see that getconf PTHREAD_KEYS_MAX gives 1024 on my machine.
Is this tunable in any way?


From what I can tell, no. A quick google search indicates as such, and 
its even #defined as 1024 on my machine.





 There may be other ways to handle this, I'm high-level considering
 potential alternatives (with more atomics, of course). The 
difficulty
 is mostly in providing the same performance in the single-threaded 
case.


 > You already have a Dwarf *.  I would consider adding some sort of
 > clone function which creates a shallow Dwarf * with its own 
embedded

 > allocator or something like that.

 The downside with this is that its an API addition, which we (the
 Dyninst + HPCToolkit projects) would need to enforce. Which isn't a
 huge deal for us, but I will need to make a case to those teams to 
make

 the shift.

 On the upside, it does provide a very understandable semantic in the
 case of parallelism. For an API without synchronization clauses, 
this

 would put our work back into the realm of "reasonably correct" (from
 "technically incorrect but works.")


Could someone give an example of this pattern?
I don't fully understand what is being proposed and how the interface
would look exactly.


An application would do something along these lines:

Dwarf* dbg = dwarf_begin(...);
Dwarf* dbg2 = dwarf_clone(dbg, ...);
pthread_create(worker, ...);
// ...
dwarf_get_units(dbg, ...);
// ...
pthread_join(worker);
dwarf_end(dbg);

// worker:
// ...
dwarf_getabbrev(...);
// ...
dwarf_end(dbg2);

The idea being that dbg2 and dbg share most of the same internal state, 
but concurrent access to said state is between Dwarfs (or 
"Dwarf_Views", maybe?), and the state is cleaned up on the original's 
dwarf_end. I suppose in database terms the Dwarfs are acting like 
separate "cursors" for the internal DWARF data. For this particular 
instance, the "top of stack" pointers would be in dbg and dbg2 (the 
non-shared state), while the atomic mem_tail would be part of the 
internal (shared) state.


I'm not sure how viable implementing this sort of thing would be, it 
might end up overhauling a lot of internals, and I'm not familiar 
enough with all the components of the API to know whether there would 
be some quirks with this style. But at least then the blanket "Dwarfs 
must be externally synchronized (all operations issued in serial)" 
implicit clause doesn't limit the parallelism at the API level. And 
those of us who don't follow that rule wouldn't have to walk on 
eggshells to avoid segfaulting.




Thanks,

Mark