Re: patch 0/2 debuginfod submission
Hi Frank, Hi Aaron, On Mon, 2019-10-28 at 15:04 -0400, Frank Ch. Eigler wrote: > Aaron Merey and I would like to propose debuginfod for merge into > elfutils mainline, after a couple of months of work. As a reminder, > this is an http server (written in C++11) for debuginfo-related > artifacts (elf/dwarf/source files), plus a C client library & command > line tool. It comes with man pages and tests. > > I'll followup with two git format-patch emails, one for the client, > and one for the server+tests+etc. The identical code is also in the > git://sourceware.org/git/elfutils.git debuginfod-submit branch. Thanks, very interesting. For those that don't know, there was also a GNU Tools Cauldron presentation about this work: https://gcc.gnu.org/wiki/cauldron2019#cauldron2019talks.The_elfutils_debuginfo_server I only browsed through the code quickly, but I like what I see. For now just a comment on the libdwfl integration. It is guarded by ENABLE_DEBUGINFOD, which is off by default. I think the support should always be enabled in libdwfl whether or not the debuginfod server is build or not, or that it should be guarded by something like ENABLE_DEBUGINFOD_CLIENT_SUPPORT (which would default to on by default). It already "works" whether or not the debuginfod library and/or server are there. So lets default to enable support in libdwfl, so people can easily build elfutils/libdw separately from elfutils/debuginfod and combine the result into something that just works. Also I think you should cache a negative result for fp_debuginfod_find_debuginfo (say assign it (void *) -1) so you don't keep trying to find the library/symbol each and every time. Having parallel code on my mind I am worried now how this works when called concurrently from two threads. There is a lot of code in libdwfl that isn't concurrent-safe at the moment. But if possible lets not introduce more. Not a big concern, but nice if you could give it a thought. Similarly, our error reporting is already pretty poor, so you aren't making things worse. But have you thought about a way for the libdwfl user to provide some way to indicate why something couldn't be resolved/found? Again, not really a big concern since the current code already has very limited/poor error reporting, but maybe you have thoughts about it? Have you thought about just calling debuginfod-find from the libdwfl code? Or is execing from a library really just a no-no? Thanks, Mark
Re: [PATCH 3/5] libdwfl: add interface for attaching to/detaching from threads
Hi Omar, On Mon, 2019-10-07 at 02:05 -0700, Omar Sandoval wrote: > libdwfl has implementations of attaching to/detaching from threads > and > unwinding stack traces. However, that functionality is only available > through the dwfl_thread_getframes interface, which isn't very flexible. > This adds two new functions, dwfl_attach_thread and dwfl_detach_thread, > which separate the thread stopping functionality out of > dwfl_thread_getframes. Additionally, it makes dwfl_thread_getframes > cache the stack trace for threads stopped this way. This makes it > possible to use the frames after dwfl_thread_getframes returns. The advantage of the dwfl_thread_getframes interface is that you cannot forget to "detach", so no Dwfl_Frames get dangling and (if the process is life) you don't disrupt it more than necessary. This new interface seems very simple to get wrong causing leaks and locking up processes/threads. Also, if we would adopt dwfl_attach_thread then I think it should take a Dwfl_Thread object not a pid/tid as argument. Could you provide some examples where this new interface/style is beneficial? Thanks, Mark
Re: [PATCH 4/5] libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
Hi Omar, On Mon, 2019-10-07 at 02:05 -0700, Omar Sandoval wrote: > The next change will need to have the Dwarf_Frame readily available, so > rather than finding it again every time, let's cache it for reuse. The > CFI frame can also be useful to clients of libdwfl, so add > dwfl_frame_dwarf_frame to get it. Similarly, the Dwfl_Module is also > frequently needed in conjunction with the frame, so cache it and add > dwfl_frame_module. I can see how this is useful. But it seems the Dwarf_Frame is only cached when __libdwfl_frame_unwind is called. Which I believe isn't done for the initial frame. Also it isn't clear how to propagate any errors when NULL is returned. Maybe dwfl_frame_dwarf_frame () should check first to see if frame is NULL and then call lookup the CFI and call dwarf_cfi_addrframe if not? Thanks, Mark
Re: [PATCH 5/5] libdwfl: add interface for evaluating DWARF expressions in a frame
Hi Omar, On Mon, 2019-10-07 at 02:05 -0700, Omar Sandoval wrote: > libdwfl can evaluate DWARF expressions in order to unwind the stack, > but this functionality isn't exposed to clients of the library. Now that > the pieces are in place, add dwfl_frame_eval_expr to provide this feature. I think this is useful. But same issue as the previous patch that I am not sure the error handling is correct for state->frame == NULL. Also this could really use some examples and maybe a small testcase. That would show how to handle DWARF expressions to are simple location descriptions (which calculate a location where a value can be found) versus implicit location descriptions (e.g. DW_OP_value) versus composite location descriptions (e.g. DW_OP_piece). It might be that we don't care, because all we care about is whether or not we can get a value, but it would validate the interface. Having some examples/testcases would also show how/where to get the DWARF expressions to use with this new function. Thanks, Mark
Re: patch 0/2 debuginfod submission
Hi, Mark - > I only browsed through the code quickly, but I like what I see. > For now just a comment on the libdwfl integration. Righto. > It is guarded by ENABLE_DEBUGINFOD, which is off by default. > I think the support should always be enabled in libdwfl whether or not > the debuginfod server is build or not, or that it should be guarded by > something like ENABLE_DEBUGINFOD_CLIENT_SUPPORT (which would default to > on by default). [...] OK, sure, IMO don't even bother with a guard. It's just a dlopen/dlsym, which is portable. Will update the first patch on the branch with that. > Also I think you should cache a negative result for > fp_debuginfod_find_debuginfo (say assign it (void *) -1) so you don't > keep trying to find the library/symbol each and every time. Will look into that later on. With the debuginfod server, a negative hit comes back instantly (no file searching, just an indexed database lookup), so this may not be a big deal. > Having parallel code on my mind I am worried now how this works when > called concurrently from two threads. There is a lot of code in libdwfl > that isn't concurrent-safe at the moment. But if possible lets not > introduce more. Not a big concern, but nice if you could give it a > thought. The new client-side code doesn't call into elfutils though, so does not amplify hypothetical problems there. In fact I'm not sure it's multithreaded at all, or just nonblocking I/O. The server is multithreaded, but that's relatively unavoidable, and turns out to work without any valgrind complaints (last time I tried). > Similarly, our error reporting is already pretty poor, so you aren't > making things worse. But have you thought about a way for the libdwfl > user to provide some way to indicate why something couldn't be > resolved/found? Again, not really a big concern since the current code > already has very limited/poor error reporting, but maybe you have > thoughts about it? We document returning standard errnos generally, and a few particular ones for network unreachability. > Have you thought about just calling debuginfod-find from the libdwfl > code? Or is execing from a library really just a no-no? It's not great as a practice. (We do that from debuginfod to popen rpm2cpio but reluctantly, and carefully with signals.) Another reason for calling directly is avoidance of a race condition, wherein two debuginfod-client processes run at the same time, and one cleans the cache right under the foot of the other. From the C API, you're guaranteed to get a usable fd, even if the underlying file was unlinked while you were looking. An fd is all elfutils needs (which I love). - FChE
Re: patch 0/2 debuginfod submission
Hi Frank, On Wed, 2019-10-30 at 09:40 -0400, Frank Ch. Eigler wrote: > OK, sure, IMO don't even bother with a guard. It's just a dlopen/dlsym, > which is portable. Will update the first patch on the branch with that. Thanks. > > Also I think you should cache a negative result for > > fp_debuginfod_find_debuginfo (say assign it (void *) -1) so you > > don't > > keep trying to find the library/symbol each and every time. > > Will look into that later on. With the debuginfod server, a negative > hit comes back instantly (no file searching, just an indexed database > lookup), so this may not be a big deal. Ah, I was not thinking that far yet. I was just worried about the dlopen/dlsym dance being done on every call. Which does its own file search to find the library. So simply setting debuginfod_so = (void *) -1; on first failure to make sure dlopen/dlsym it is never called again. > > Having parallel code on my mind I am worried now how this works when > > called concurrently from two threads. There is a lot of code in libdwfl > > that isn't concurrent-safe at the moment. But if possible lets not > > introduce more. Not a big concern, but nice if you could give it a > > thought. > > The new client-side code doesn't call into elfutils though, so does > not amplify hypothetical problems there. In fact I'm not sure it's > multithreaded at all, or just nonblocking I/O. > > The server is multithreaded, but that's relatively unavoidable, and > turns out to work without any valgrind complaints (last time I tried). Again, I wasn't even thinking that far. I was just concerned about the library/function call setup being not concurrent-safe code. So maybe simply move it to dwfl_begin? > > Similarly, our error reporting is already pretty poor, so you aren't > > making things worse. But have you thought about a way for the libdwfl > > user to provide some way to indicate why something couldn't be > > resolved/found? Again, not really a big concern since the current code > > already has very limited/poor error reporting, but maybe you have > > thoughts about it? > > We document returning standard errnos generally, and a few particular > ones for network unreachability. But those aren't propagated to users for libdwfl find_elf or find_debuginfo. I was really just wondering if you thought about that in the scope of the libdwfl calls. Currently there is no real good way to do any error reporting there. So it isn't a big concern. But if you have any ideas for improvement that would be nice. > > Have you thought about just calling debuginfod-find from the libdwfl > > code? Or is execing from a library really just a no-no? > > It's not great as a practice. (We do that from debuginfod to popen > rpm2cpio but reluctantly, and carefully with signals.) Another reason > for calling directly is avoidance of a race condition, wherein two > debuginfod-client processes run at the same time, and one cleans the > cache right under the foot of the other. From the C API, you're > guaranteed to get a usable fd, even if the underlying file was > unlinked while you were looking. An fd is all elfutils needs (which I > love). All good points. Ignore that I even mentioned calling exec directly please :) Thanks, Mark
Re: patch 0/2 debuginfod submission
Hi - > Ah, I was not thinking that far yet. I was just worried about the > dlopen/dlsym dance being done on every call. Which does its own file > search to find the library. So simply setting debuginfod_so = (void *) > -1; on first failure to make sure dlopen/dlsym it is never called > again. Revised code to look like this (libdwfl/find-debuginfo.c): 00404 /* NB: this is slightly thread-unsafe */ 00405 static __typeof__ (debuginfod_find_debuginfo) *fp_debuginfod_find_debuginfo; 00406 00407 if (fp_debuginfod_find_debuginfo == NULL) 00408 { 00409 void *debuginfod_so = dlopen("libdebuginfod-" VERSION ".so", RTLD_LAZY); 00410 if (debuginfod_so == NULL) 00411 debuginfod_so = dlopen("libdebuginfod.so", RTLD_LAZY); 00412 if (debuginfod_so != NULL) 00413 fp_debuginfod_find_debuginfo = dlsym (debuginfod_so, "debuginfod_find_debuginfo"); 00414 if (fp_debuginfod_find_debuginfo == NULL) 00415 fp_debuginfod_find_debuginfo = (void *) -1; /* never try again */ 00416 } 00417 00418 if (fp_debuginfod_find_debuginfo != NULL && fp_debuginfod_find_debuginfo != (void *) -1) 00419 { 00420 /* If all else fails and a build-id is available, query the 00421debuginfo-server if enabled. */ 00422 if (fd < 0 && bits_len > 0) 00423 fd = (*fp_debuginfod_find_debuginfo) (bits, bits_len, NULL); 00424 } > > We document returning standard errnos generally, and a few particular > > ones for network unreachability. > > But those aren't propagated to users for libdwfl find_elf or > find_debuginfo. I was really just wondering if you thought about that > in the scope of the libdwfl calls. Currently there is no real good way > to do any error reporting there. So it isn't a big concern. But if you > have any ideas for improvement that would be nice. Will think about it. - FChE
Re: [PATCH 3/5] libdwfl: add interface for attaching to/detaching from threads
On Wed, Oct 30, 2019 at 01:47:52PM +0100, Mark Wielaard wrote: > Hi Omar, > > On Mon, 2019-10-07 at 02:05 -0700, Omar Sandoval wrote: > > libdwfl has implementations of attaching to/detaching from threads > > and > > unwinding stack traces. However, that functionality is only available > > through the dwfl_thread_getframes interface, which isn't very flexible. > > This adds two new functions, dwfl_attach_thread and dwfl_detach_thread, > > which separate the thread stopping functionality out of > > dwfl_thread_getframes. Additionally, it makes dwfl_thread_getframes > > cache the stack trace for threads stopped this way. This makes it > > possible to use the frames after dwfl_thread_getframes returns. > > The advantage of the dwfl_thread_getframes interface is that you cannot > forget to "detach", so no Dwfl_Frames get dangling and (if the process > is life) you don't disrupt it more than necessary. This new interface > seems very simple to get wrong causing leaks and locking up > processes/threads. That is true, although I imagine that use cases which don't need the additional flexibility would continue to use the simpler API. > Also, if we would adopt dwfl_attach_thread then I think it should take > a Dwfl_Thread object not a pid/tid as argument. In that case, we'd probably want to expose the internal getthread function with something like: /* Find the thread with the given thread ID. Returns zero if the thread was processed, -1 on error (including when no thread with the given thread ID exists), or the return value of the callback when not DWARF_CB_OK. */ int dwfl_getthread (Dwfl *dwfl, pid_t tid, int (*callback) (Dwfl_Thread *thread, void *arg), void *arg) __nonnull_attribute__ (1, 3); Then dwfl_attach_thread could be used with either dwfl_getthread or dwfl_getthreads, which is nice. However, a crucial part of this feature is being able to access the Dwfl_Thread outside of the callback. Since the Dwfl_Thread is currently on the stack, I see a couple of options: 1. We keep Dwfl_Thread on the stack and make dwfl_attach_thread() return a copy (like it does in this patch). 2. We always allocate the Dwfl_Thread on the heap and free it before returning from dwfl_getthread(s) _unless_ dwfl_attach_thread() was called. If it was, it will be freed by dwfl_detach_thread() instead. Option 2 sounds better to me. What do you think? > Could you provide some examples where this new interface/style is > beneficial? dwfl_attach_thread could be used to implement something like `gdb -p $pid`: attach to a thread, stop it, and poke at it further. dwfl_detach_thread would be kind of like GDB's `continue` command. I applied these patches to the version of elfutils that I bundle with drgn; you can see how I use the interface here [1]. Basically, drgn has an API to get the stack trace for a thread [2] and get the value of registers (and in the future, local variables) at each stack frame [3]. When I get the stack trace, I use dwfl_attach_thread and dwfl_thread_getframes. The user can then access the frames to their heart's content. When they're done with it, I free everything with dwfl_detach_thread. (The attach/detach is currently tied to the lifetime of the stack trace because I don't yet support actually pausing threads, but I plan to support that, hopefully using the same libdwfl attach/detach API.) Thanks, and sorry for the wall of text :) 1: https://github.com/osandov/drgn/blob/master/libdrgn/stack_trace.c 2: https://drgn.readthedocs.io/en/latest/api_reference.html#drgn.Program.stack_trace 3: https://drgn.readthedocs.io/en/latest/api_reference.html#stack-traces
Re: [PATCH 4/5] libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
On Wed, Oct 30, 2019 at 02:04:42PM +0100, Mark Wielaard wrote: > Hi Omar, > > On Mon, 2019-10-07 at 02:05 -0700, Omar Sandoval wrote: > > The next change will need to have the Dwarf_Frame readily available, so > > rather than finding it again every time, let's cache it for reuse. The > > CFI frame can also be useful to clients of libdwfl, so add > > dwfl_frame_dwarf_frame to get it. Similarly, the Dwfl_Module is also > > frequently needed in conjunction with the frame, so cache it and add > > dwfl_frame_module. > > I can see how this is useful. But it seems the Dwarf_Frame is only > cached when __libdwfl_frame_unwind is called. Which I believe isn't > done for the initial frame. Also it isn't clear how to propagate any > errors when NULL is returned. Maybe dwfl_frame_dwarf_frame () should > check first to see if frame is NULL and then call lookup the CFI and > call dwarf_cfi_addrframe if not? Yes, that makes sense. Rather than doing the lookups in __libdwfl_frame_unwind and handle_cfi, I can move the lookups to dwfl_frame_module and dwfl_frame_dwarf_frame, have those functions cache it internally, and make __libdwfl_frame_unwind and handle_cfi call those functions. Thanks, Omar
Re: [PATCH 5/5] libdwfl: add interface for evaluating DWARF expressions in a frame
On Wed, Oct 30, 2019 at 02:23:06PM +0100, Mark Wielaard wrote: > Hi Omar, > > On Mon, 2019-10-07 at 02:05 -0700, Omar Sandoval wrote: > > libdwfl can evaluate DWARF expressions in order to unwind the stack, > > but this functionality isn't exposed to clients of the library. Now that > > the pieces are in place, add dwfl_frame_eval_expr to provide this feature. > > I think this is useful. But same issue as the previous patch that I am > not sure the error handling is correct for state->frame == NULL. The change I described for the previous patch should fix that. > Also this could really use some examples and maybe a small testcase. > > That would show how to handle DWARF expressions to are simple location > descriptions (which calculate a location where a value can be found) > versus implicit location descriptions (e.g. DW_OP_value) versus > composite location descriptions (e.g. DW_OP_piece). > > It might be that we don't care, because all we care about is whether or > not we can get a value, but it would validate the interface. > > Having some examples/testcases would also show how/where to get the > DWARF expressions to use with this new function. Sounds good, I'll put some examples/test cases together. FWIW, I'm using it in drgn to get register values like so: LIBDRGN_PUBLIC struct drgn_error * drgn_stack_frame_register(struct drgn_stack_frame frame, enum drgn_register_number regno, uint64_t *ret) { const Dwarf_Op op = { .atom = DW_OP_regx, .number = regno, }; Dwarf_Addr value; if (!dwfl_frame_eval_expr(frame.trace->frames[frame.i], &op, 1, &value)) return drgn_error_libdwfl(); *ret = value; return NULL; } Later, I plan to use this for location expressions for local variables that I get out of DWARF. Thanks, Omar
[PATCH] libdw: Don't free uninitialized Dwarf_Abbrev_Hash's of "fake" CUs.
fake_{loc,loclists,addr}_cu are Dwarf_CUs that are created separate from all the others, so their contents are minimal and mostly initialized by a calloc. On dwarf_end however, they are freed through the same code path as all the others, so they call DAH_free like all the others. This changes that so that these three are exempt from DAH and split-DWARF matters, and swaps the calloc for a malloc so Memcheck will catch any others. --- Sorry to keep adding patches, but ran across this while tinkering about. Its not a current issue, but when the concurrent hash table is integrated this could cause some issues (on some system somewhere, maybe). In that case there are enough moving internals that a calloc may not initialize it properly. Its also an error waiting to happen, so this is this just cleans it up with some semblance of logical correctness. (In other news, I finally figured out why git send-email wasn't working for me! Thanks for everyone's patience, especially Wielaard's :) ) libdw/dwarf_begin_elf.c | 15 --- libdw/dwarf_end.c | 22 +- 2 files changed, 25 insertions(+), 12 deletions(-) diff --git a/libdw/dwarf_begin_elf.c b/libdw/dwarf_begin_elf.c index 8d137414..1b2c5a45 100644 --- a/libdw/dwarf_begin_elf.c +++ b/libdw/dwarf_begin_elf.c @@ -223,7 +223,7 @@ valid_p (Dwarf *result) inside the .debug_loc or .debug_loclists section. */ if (result != NULL && result->sectiondata[IDX_debug_loc] != NULL) { - result->fake_loc_cu = (Dwarf_CU *) calloc (1, sizeof (Dwarf_CU)); + result->fake_loc_cu = (Dwarf_CU *) malloc (sizeof (Dwarf_CU)); if (unlikely (result->fake_loc_cu == NULL)) { Dwarf_Sig8_Hash_free (&result->sig8_hash); @@ -240,12 +240,15 @@ valid_p (Dwarf *result) result->fake_loc_cu->endp = (result->sectiondata[IDX_debug_loc]->d_buf + result->sectiondata[IDX_debug_loc]->d_size); + result->fake_loc_cu->locs = NULL; + result->fake_loc_cu->address_size = 0; + result->fake_loc_cu->version = 0; } } if (result != NULL && result->sectiondata[IDX_debug_loclists] != NULL) { - result->fake_loclists_cu = (Dwarf_CU *) calloc (1, sizeof (Dwarf_CU)); + result->fake_loclists_cu = (Dwarf_CU *) malloc (sizeof (Dwarf_CU)); if (unlikely (result->fake_loclists_cu == NULL)) { Dwarf_Sig8_Hash_free (&result->sig8_hash); @@ -263,6 +266,9 @@ valid_p (Dwarf *result) result->fake_loclists_cu->endp = (result->sectiondata[IDX_debug_loclists]->d_buf + result->sectiondata[IDX_debug_loclists]->d_size); + result->fake_loclists_cu->locs = NULL; + result->fake_loclists_cu->address_size = 0; + result->fake_loclists_cu->version = 0; } } @@ -272,7 +278,7 @@ valid_p (Dwarf *result) inside the .debug_addr section, if it exists. */ if (result != NULL && result->sectiondata[IDX_debug_addr] != NULL) { - result->fake_addr_cu = (Dwarf_CU *) calloc (1, sizeof (Dwarf_CU)); + result->fake_addr_cu = (Dwarf_CU *) malloc (sizeof (Dwarf_CU)); if (unlikely (result->fake_addr_cu == NULL)) { Dwarf_Sig8_Hash_free (&result->sig8_hash); @@ -291,6 +297,9 @@ valid_p (Dwarf *result) result->fake_addr_cu->endp = (result->sectiondata[IDX_debug_addr]->d_buf + result->sectiondata[IDX_debug_addr]->d_size); + result->fake_addr_cu->locs = NULL; + result->fake_addr_cu->address_size = 0; + result->fake_addr_cu->version = 0; } } diff --git a/libdw/dwarf_end.c b/libdw/dwarf_end.c index a2e94436..a32a149e 100644 --- a/libdw/dwarf_end.c +++ b/libdw/dwarf_end.c @@ -52,18 +52,22 @@ cu_free (void *arg) { struct Dwarf_CU *p = (struct Dwarf_CU *) arg; - Dwarf_Abbrev_Hash_free (&p->abbrev_hash); - tdestroy (p->locs, noop_free); - /* Free split dwarf one way (from skeleton to split). */ - if (p->unit_type == DW_UT_skeleton - && p->split != NULL && p->split != (void *)-1) + if(p != p->dbg->fake_loc_cu && p != p->dbg->fake_loclists_cu + && p != p->dbg->fake_addr_cu) { - /* The fake_addr_cu might be shared, only release one. */ - if (p->dbg->fake_addr_cu == p->split->dbg->fake_addr_cu) - p->split->dbg->fake_addr_cu = NULL; - INTUSE(dwarf_end) (p->split->dbg); + Dwarf_Abbrev_Hash_free (&p->abbrev_hash); + + /* Free split dwarf one way (from skeleton to split). */ + if (p->unit_type == DW_UT_skeleton + && p->split != NULL && p->split != (void *)-1) + { + /* The fake_addr_cu might be shared, only release one. */ + if (p->dbg->fake_addr_cu == p->split->dbg->fake_addr_cu) + p->split->dbg->fake_addr_cu = NULL; + INTUSE(dwarf_end) (p->split->dbg); + } } } -- 2.24.0.rc1