Re: Deadlock from --enable-thread-safety

2023-08-04 Thread John Mellor-Crummey via Elfutils-devel
Mark,

A third option that Heather and I discussed for the routines that needed 
replication (because sometimes they are called holding a write lock and 
sometimes not) was to put each routine in an include file where the write lock 
status is expected to be defined as a macro. Then, we can define the macro one 
way (write lock held), include the file, then define the macro the other way 
(write lock not held) and include the function again. The write lock status 
would be incorporated into the function name. 

This avoids two copies of the code to maintain and calls the correct helper 
routines that either 
- use the lock already possessed or 
- acquire their own lock if the caller doesn’t have it already.

Would that strategy be acceptable to you. None of the code does anything like 
that as far as I know. 

John Mellor-Crummey

(sent from my phone)

> On Aug 4, 2023, at 8:21 AM, Heather McIntyre via Elfutils-devel 
>  wrote:
> 
> I've been making --enable-thread-safety (USE_LOCKS) more viable by fixing
> deadlocks throughout the libelf library. This has required minimal code
> changes so far. However, I've hit a snag where "__elf64_updatenull_wrlock"
> calls "elf64_getchdr," which leads to "elf64_getshdr." This sequence
> attempts to acquire a read lock under a write lock and fails. Similarly,
> "elf64_getchdr" calls "elf_getdata," which also tries and fails to
> implement a read lock.
> 
> Unfortunately, fixing this particular deadlock requires more than minimal
> code changes. From what I understand, I may have a few options on how to
> fix this:
> 
> 1) Create copies of the getchdr and elf_getdata functions, renaming them
> getchdr_wrlock and elf_getdata_wrlock. However, multiple copies of these
> functions could bloat the code, which may not be ideal.
> 2) Some type of write lock flag to indicate when a write lock is active.
> Either within the locking macro in eu-config.h or passed as an argument in
> the functions.
> 
> Ultimately, I thought others may want to look into this to see if there
> might be options for a better solution. Here is the backtrace:
> 
> Program received signal SIGABRT, Aborted.
> 0x77837aff in raise () from /lib64/libc.so.6
> #0  0x77837aff in raise () from /lib64/libc.so.6
> #1  0x7780aea5 in abort () from /lib64/libc.so.6
> #2  0x7780ad79 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
> #3  0x778304c9 in __assert_perror_fail () from /lib64/libc.so.6
> #4  0x77fda554 in elf64_getshdr (scn=0x40fc20) at
> ../../libelf/elf32_getshdr.c:292
> #5  0x77fe9590 in elf64_getchdr (scn=0x40fc20) at
> ../../libelf/elf32_getchdr.c:45
> #6  0x77fdf690 in __elf64_updatenull_wrlock (elf=0x40d880,
> change_bop=0x7fffac60, shnum=30) at ../../libelf/elf32_updatenull.c:407
> #7  0x77fdd83d in elf_update (elf=0x40d880, cmd=ELF_C_WRITE) at
> ../../libelf/elf_update.c:211
> #8  0x00405d9e in process_file (fname=0x7fffbb96
> "testfile-s390x-hash-both") at ../../src/elfcompress.c:1336
> #9  0x00406232 in main (argc=7, argv=0x7fffb768) at
> ../../src/elfcompress.c:1458


Re: [PATCH] Fix thread-safety for elfutils

2023-08-21 Thread John Mellor-Crummey via Elfutils-devel
Any thoughts about the patch from my student, Heather McIntyre?
--
John Mellor-Crummey Professor
Dept of Computer ScienceRice University
email: joh...@rice.edu  phone: 713-348-5179

> On Aug 8, 2023, at 12:07 PM, Heather McIntyre  wrote:
> 
> Hello all, 
> 
> This patch was created to address thread-safety issues reported in bug 26921 
>  and bug 26930 
> .  
> Additionally, other thread-safety fixes were applied during the process.
> 
> Brief Description:
> Locking was implemented for tsearch and tfind.
> Changes to multiple files within the libelf library were made such that 
> USE_LOCKS no longer causes tests to fail when enabled.
> New tests were added to confirm thread-safety. 
> 
> Please review the attached patch file and feel free to provide feedback. I am 
> available for any clarifications or modifications needed.
> 
> Best regards,
> Heather McIntyre
> <0001-Fix-thread-safety-for-elfutils.patch>



Extension: read inlining info in an NVIDIA extended line map

2021-09-05 Thread John Mellor-Crummey via Elfutils-devel
As of CUDA 11.2, NVIDIA added extensions to the line map section
of CUDA binaries to represent inlined functions. These extensions
include

 - two new fields in a line table row to represent inline 
   information: context, and functionname,

 - two new DWARF extended opcodes: DW_LNE_inlined_call, 
   DW_LNE_set_function_name,

 - an additional word in the line table header that indicates 
   the offset in the .debug_str function where the function 
   names for this line table begin, and

 - two new functions in the libdw API: dwarf_linecontext and 
   dwarf_linefunctionname, which return the new line table fields.

A line table row for an inlined function contains a non-zero
"context" value. The “context” field indicates the index of the
line table row that serves as the call site for an inlined
context.

The "functionname" field in a line table row is only meaningful
if the "context" field of the row is non-zero. A meaningful
"functionname" field contains an index into the .debug_str
section relative to the base offset established in the line table
header; the position in the .debug_str section indicates the name
of the inlined function.

These extensions resemble the proposed DWARF extensions
(http://dwarfstd.org/ShowIssue.php?issue=140906.1) by Cary
Coutant, but are not identical.

This patch adds integrates support for handling NVIDIA's extended
line maps into elfutil's libdw library and the readelf command
line utility.

Since this support is a non-standard extension to DWARF, all code
that implements the extensions is implemented between markers  
/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */ and 
/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */.

The definition below

 #define NVIDIA_LINEMAP_INLINING_EXTENSIONS 1

is added to elfutils/version.h, which enables a client of elfutils 
to test whether the NVIDIA line map extensions are present. 

Note: The support for NVIDIA extended line maps adds two integer
fields (context and functionname) to struct Dwarf_Line_s, which
makes the structure about 30% larger.

The patch includes a binary testfile_nvidia_linemap.bz2 that
contains an NVIDIA extended linemap along with two tests that
read the line map.

 - A test script run-nvidia-extended-linemap-readelf.sh 
   checks the output of readelf on the new test binary to 
   validate its dump of the line op codes containing context 
   and functionname entries.

 - A test program tests/nvidia_extended_linemap_libdw.c reads 
   the extended line map with dwarf_next_lines and dumps the 
   context and functionname fields of the line map where they 
   are relevant, i.e. the value of context is non-zero. A test 
   script run-nvidia-extended-linemap-libdw.sh runs this test 
   and validates its output.

A patch with the new functionality described above is attached.
--
John Mellor-Crummey Professor
Dept of Computer ScienceRice University
email: joh...@rice.edu  phone: 713-348-5179



0001-Read-inlining-info-in-NVIDIA-extended-line-map.patch
Description: Binary data





[PATCH] read inlining info in an NVIDIA extended line map (was: Extension ...)

2021-09-10 Thread John Mellor-Crummey via Elfutils-devel
My previous patch submission seems to have been overlooked as buildbot issues 
consumed several days this week. However, discussion in the mailing list now 
seems to have moved on beyond my submission and I would like my patch 
considered. Here, I echo my previous submission, except I improved my 
submission by including the prefix [PATCH] in the subject and I included the 
patch in the message body rather than as an attachment.

Regarding the DWARF proposal by Cary Coutant for two-level linemaps: I now 
believe that NVIDIA’s implementation is consistent with that proposal although 
NVIDIA uses a DWARF extended opcode for inlined calls whereas Cary’s proposal 
uses DW_LNS_inlined_call (a standard opcode), NVIDIA’s implementation uses 
DW_LNE_inlined_call (an extended opcode).

A note about the source code for the test case reading the extended linemap 
entries using libdw: this code was copied from another test that used 
dwarf_next_lines and extended with code that reads the new context and 
functionname fields of a line table entry.
--
John Mellor-Crummey Professor
Dept of Computer ScienceRice University
email: joh...@rice.edu  phone: 713-348-5179

> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey  wrote:
> 
> As of CUDA 11.2, NVIDIA added extensions to the line map section
> of CUDA binaries to represent inlined functions. These extensions
> include
> 
> - two new fields in a line table row to represent inline 
>   information: context, and functionname,
> 
> - two new DWARF extended opcodes: DW_LNE_inlined_call, 
>   DW_LNE_set_function_name,
> 
> - an additional word in the line table header that indicates 
>   the offset in the .debug_str function where the function 
>   names for this line table begin, and
> 
> - two new functions in the libdw API: dwarf_linecontext and 
>   dwarf_linefunctionname, which return the new line table fields.
> 
> A line table row for an inlined function contains a non-zero
> "context" value. The “context” field indicates the index of the
> line table row that serves as the call site for an inlined
> context.
> 
> The "functionname" field in a line table row is only meaningful
> if the "context" field of the row is non-zero. A meaningful
> "functionname" field contains an index into the .debug_str
> section relative to the base offset established in the line table
> header; the position in the .debug_str section indicates the name
> of the inlined function.
> 
> These extensions resemble the proposed DWARF extensions
> (http://dwarfstd.org/ShowIssue.php?issue=140906.1) by Cary
> Coutant, but are not identical.
> 
> This patch adds integrates support for handling NVIDIA's extended
> line maps into elfutil's libdw library and the readelf command
> line utility.
> 
> Since this support is a non-standard extension to DWARF, all code
> that implements the extensions is implemented between markers  
> /* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */ and 
> /* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */.
> 
> The definition below
> 
> #define NVIDIA_LINEMAP_INLINING_EXTENSIONS 1
> 
> is added to elfutils/version.h, which enables a client of elfutils 
> to test whether the NVIDIA line map extensions are present. 
> 
> Note: The support for NVIDIA extended line maps adds two integer
> fields (context and functionname) to struct Dwarf_Line_s, which
> makes the structure about 30% larger.
> 
> The patch includes a binary testfile_nvidia_linemap.bz2 that
> contains an NVIDIA extended linemap along with two tests that
> read the line map.
> 
> - A test script run-nvidia-extended-linemap-readelf.sh 
>   checks the output of readelf on the new test binary to 
>   validate its dump of the line op codes containing context 
>   and functionname entries.
> 
> - A test program tests/nvidia_extended_linemap_libdw.c reads 
>   the extended line map with dwarf_next_lines and dumps the 
>   context and functionname fields of the line map where they 
>   are relevant, i.e. the value of context is non-zero. A test 
>   script run-nvidia-extended-linemap-libdw.sh runs this test 
>   and validates its output.
> 
> A patch with the new functionality described above is attached.
> --
> John Mellor-Crummey   Professor
> Dept of Computer Science  Rice University
> email: joh...@rice.eduphone: 713-348-5179

From 32e6b3530677bcbb595ba9010d49feef03314bd4 Mon Sep 17 00:00:00 2001
From: John M Mellor-Crummey 
Date: Fri, 3 Sep 2021 16:13:40 -0500
Subject: [PATCH] Read inlining info in NVIDIA extended line map

Signed-off-by: John M Mellor-Crummey 
---
 ChangeLog|   4 +
 config/version.h.in  |   4 +
 libdw/Makefile.am|   1 +
 libdw/dwarf.h|   4 +
 libdw/dwarf_getsrclines.c|  52 ++-
 libdw/dwarf_linecontext.c|  47 ++
 libdw/dwarf_linefunctionname.c   |  47 ++
 libdw/libdw.h 

[PATCH] (revised) read inlining info in an NVIDIA extended line map (was: Extension ...)

2021-09-15 Thread John Mellor-Crummey via Elfutils-devel
Mark,

Thanks for your feedback. I am working on a new version of the patch that 
changes
the interface for dwarf_linecontext and dwarf_linefunctionname to return a line 
pointer and a character pointer so it will be future proof.

See my other comments below, e.g. about an idea for reworking the Dwarf_line_s 
data structure.
--
John Mellor-Crummey Professor
Dept of Computer ScienceRice University
email: joh...@rice.edu  phone: 713-348-5179

> On Sep 10, 2021, at 12:11 PM, Mark Wielaard  wrote:
> 
> Hi John,
> 
> On Fri, 2021-09-10 at 10:49 -0500, John Mellor-Crummey via Elfutils-
> devel wrote:
>> My previous patch submission seems to have been overlooked as
>> buildbot issues consumed several days this week. However, discussion
>> in the mailing list now seems to have moved on beyond my submission
>> and I would like my patch considered. Here, I echo my previous
>> submission, except I improved my submission by including the prefix
>> [PATCH] in the subject and I included the patch in the message body
>> rather than as an attachment.
> 
> Sorry for the buildbot noise, that was indeed a little painful. But a
> buildbot that randomly fails is not much fun, so it took highest
> priority to get it back to green.
> 
> Your submission is really nice, having extensive documentation, all
> features implemented, a testcase. Well done.
> 
> There are however some concerns outside your control. It is somewhat
> disappointing NVIDIA didn't document this themselves, or tried to
> standardize this. You seems to have a good grasp of what was intended
> though. We have to be careful not to extend the api in a way that makes
> a better standard/implementation impossible. And the way we implemented
> Dwarf_Lines isn't ideal, so this extension shows we should maybe change
> it to be more efficient/compact. But maybe we can do that after adding
> the extension, we should however have a plan.
> 
>> Regarding the DWARF proposal by Cary Coutant for two-level linemaps:
>> I now believe that NVIDIA’s implementation is consistent with that
>> proposal although NVIDIA uses a DWARF extended opcode for inlined
>> calls whereas Cary’s proposal uses DW_LNS_inlined_call (a standard
>> opcode), NVIDIA’s implementation uses DW_LNE_inlined_call (an
>> extended opcode).
> 
> The naming is one of the concerns. It would be better to use a name
> like DW_LNE_NVIDIA_inlined_call and DW_LNE_NVIDIA_set_function_name to
> show they are vendor extensions and don't clash with possible future
> standard opcode names.

I renamed the two new DWARF extended opcodes as you suggested.

> That it mimics the two-level linemaps proposal is a good thing. But
> lets make sure that the new accessor functions, dwarf_linecontext and
> dwarf_linefunctionname, are generic enough that they can hopefully be
> reused when two-level linemaps or a similar proposal becomes a
> standard.


>> A note about the source code for the test case reading the extended
>> linemap entries using libdw: this code was copied from another test
>> that used dwarf_next_lines and extended with code that reads the new
>> context and functionname fields of a line table entry.
> 
> Thanks for the test case, it makes clear how the new functionality can
> be used. How was the test binary, testfile_nvidia_linemap, generated?
> That should probably be documented inside the testcase.

I documented how the NVIDIA binary used in the two test cases was created
by adding comments to the two test cases.

> I won't be able to review all the code right now, but here are some
> high-level comments, so you know what I am thinking.
> 
> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey 
>>> wrote:
>>> 
>>> As of CUDA 11.2, NVIDIA added extensions to the line map section
>>> of CUDA binaries to represent inlined functions. These extensions
>>> include
>>> 
>>> - two new fields in a line table row to represent inline 
>>>  information: context, and functionname,
> 
> We didn't design the Dwarf_Line_s very well/compact. We already have
> the op_index, isa and discriminator fields which are almost never used.
> This adds two more. I wonder if we can split the struct up so that the
> extra fields are only used when actually used.



> Maybe this can be done with a flag in Dwarf_Lines_s indicating whether
> the array contains only the basic line attributes or also some extended
> values. Of course this makes dwarf_getsrclines even more complex
> because it then has to expand the line state struct whenever it first
> sees an extended attribute. But I really like to see if we can use less
> memory here. If we agree on some way to make t

[PATCH] (v2) read inlining info in an NVIDIA extended line map

2021-11-04 Thread John Mellor-Crummey via Elfutils-devel
scussion about the first version of the patch:

> On Sep 15, 2021, at 1:25 PM, John Mellor-Crummey  wrote:
> 
> Mark,
> 
> Thanks for your feedback. I am working on a new version of the patch that 
> changes
> the interface for dwarf_linecontext and dwarf_linefunctionname to return a 
> line pointer and a character pointer so it will be future proof.
> 
> See my other comments below, e.g. about an idea for reworking the 
> Dwarf_line_s data structure.
> --
> John Mellor-Crummey   Professor
> Dept of Computer Science  Rice University
> email: joh...@rice.edu <mailto:joh...@rice.edu>   phone: 
> 713-348-5179
> 
>> On Sep 10, 2021, at 12:11 PM, Mark Wielaard > <mailto:m...@klomp.org>> wrote:
>> 
>> Hi John,
>> 
>> On Fri, 2021-09-10 at 10:49 -0500, John Mellor-Crummey via Elfutils-
>> devel wrote:
>>> My previous patch submission seems to have been overlooked as
>>> buildbot issues consumed several days this week. However, discussion
>>> in the mailing list now seems to have moved on beyond my submission
>>> and I would like my patch considered. Here, I echo my previous
>>> submission, except I improved my submission by including the prefix
>>> [PATCH] in the subject and I included the patch in the message body
>>> rather than as an attachment.
>> 
>> Sorry for the buildbot noise, that was indeed a little painful. But a
>> buildbot that randomly fails is not much fun, so it took highest
>> priority to get it back to green.
>> 
>> Your submission is really nice, having extensive documentation, all
>> features implemented, a testcase. Well done.
>> 
>> There are however some concerns outside your control. It is somewhat
>> disappointing NVIDIA didn't document this themselves, or tried to
>> standardize this. You seems to have a good grasp of what was intended
>> though. We have to be careful not to extend the api in a way that makes
>> a better standard/implementation impossible. And the way we implemented
>> Dwarf_Lines isn't ideal, so this extension shows we should maybe change
>> it to be more efficient/compact. But maybe we can do that after adding
>> the extension, we should however have a plan.
>> 
>>> Regarding the DWARF proposal by Cary Coutant for two-level linemaps:
>>> I now believe that NVIDIA’s implementation is consistent with that
>>> proposal although NVIDIA uses a DWARF extended opcode for inlined
>>> calls whereas Cary’s proposal uses DW_LNS_inlined_call (a standard
>>> opcode), NVIDIA’s implementation uses DW_LNE_inlined_call (an
>>> extended opcode).
>> 
>> The naming is one of the concerns. It would be better to use a name
>> like DW_LNE_NVIDIA_inlined_call and DW_LNE_NVIDIA_set_function_name to
>> show they are vendor extensions and don't clash with possible future
>> standard opcode names.
> 
> I renamed the two new DWARF extended opcodes as you suggested.
> 
>> That it mimics the two-level linemaps proposal is a good thing. But
>> lets make sure that the new accessor functions, dwarf_linecontext and
>> dwarf_linefunctionname, are generic enough that they can hopefully be
>> reused when two-level linemaps or a similar proposal becomes a
>> standard.
> 
> 
>>> A note about the source code for the test case reading the extended
>>> linemap entries using libdw: this code was copied from another test
>>> that used dwarf_next_lines and extended with code that reads the new
>>> context and functionname fields of a line table entry.
>> 
>> Thanks for the test case, it makes clear how the new functionality can
>> be used. How was the test binary, testfile_nvidia_linemap, generated?
>> That should probably be documented inside the testcase.
> 
> I documented how the NVIDIA binary used in the two test cases was created
> by adding comments to the two test cases.
> 
>> I won't be able to review all the code right now, but here are some
>> high-level comments, so you know what I am thinking.
>> 
>> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey > <mailto:joh...@rice.edu>>
>>>> wrote:
>>>> 
>>>> As of CUDA 11.2, NVIDIA added extensions to the line map section
>>>> of CUDA binaries to represent inlined functions. These extensions
>>>> include
>>>> 
>>>> - two new fields in a line table row to represent inline 
>>>>  information: context, and functionname,
>> 
>> We didn't design the Dwarf_Line_s very well/compact. We already have
>> the op_index

[PATCH] version 2 read extended nvidia linemap

2021-11-05 Thread John Mellor-Crummey via Elfutils-devel

the attachment was on the original message. i am resending the attachment.




0001-Read-inlining-info-in-NVIDIA-extended-line-map.patch
Description: Binary data



(sent from my phone)

Re: [PATCH] (v2) read inlining info in an NVIDIA extended line map

2021-11-10 Thread John Mellor-Crummey via Elfutils-devel
Mark,

Your tweaks are fine. Many thanks for accepting our patch before 186!
--
John Mellor-Crummey Professor
Dept of Computer ScienceRice University
email: joh...@rice.edu  phone: 713-348-5179

> On Nov 10, 2021, at 4:16 AM, Mark Wielaard  wrote:
> 
> Hi John,
> 
> On Thu, Nov 04, 2021 at 04:41:58PM -0500, John Mellor-Crummey via 
> Elfutils-devel wrote:
>> Here I describe just the improvements to that patch that address Mark’s 
>> concerns:
>> 
>> (1) all of the code for handling NVIDIA DWARF extensions is always
>> available; there is no special configuration switch needed.
>> (2) all changes are bracketed by comments that mark them NVIDIA
>> extensions
>> (3) the DWARF extended opcodes have been renamed with names that
>> include NVIDIA in them
>> (4) the two new API functions to surface the new information have
>> been improved to separate the interface result from the internal
>> representation (at Mark’s request)
>> (4a) the API for extracting the name of an inlined function in a
>>DWARF line now returns a const char * instead of a string
>>table index
>> (4b) the API for extracting an inline “context” now returns a
>>pointer to a DWARF line where the code is inlined rather
>>than returning an unsigned int (an index into the line table
>>that one could use to compute the pointer)
>> (5) there are test cases for readelf and libdw that use a binary
>> generated by NVIDIA’s compiler. the test cases include information
>> about how the binary was generated
> 
> This is really nice. I did make a few tweaks:
> 
> - Added your original overview as commit message because it contains
>  all the relevant context and pointers to more information.
> 
> - Added ChangeLog and NEWS entries, mainly for my own review.
> 
> - I removed the bracketed comments, I think they cluttered the code
>  and made it seem like we wanted to remove it or disable at some
>  point. I think it should just be considered part of the normal code
>  now.
> 
> - I removed the NVIDIA_LINEMAP_INLINING_EXTENSIONS define from
>  version.h. If people want they can have a configure check for the
>  new dwarf_linecontext or dwarf_linefunctionname functions or the
>  DW_LNE_NVIDIA_inlined_call or DW_LNE_NVIDIA_set_function_name
>  constants.
> 
> - I made dwarf_linefunctionname always return NULL on error (not
>  the magic string "???", which is still used in readelf).
> 
> - Changed the header check to be exactly 4 bytes, so we are sure to be
>  able to read the str offset completely (if it is smaller or larger
>  we cannot handle it).
> 
> - The new dwarf_linecontext and dwarf_linefunctionname get their own
>  new ELFUTILS_0.186 section in libdw.map because they are introduced
>  with verion 0.186.
> 
> - The new run-nvidia-extended-linemap-libdw.sh and
>  run-nvidia-extended-linemap-readelf.sh sripts and
>  testfile_nvidia_linemap.bz2 testfile were added to EXTRA_DIST so
>  they show up in a dist tarball.
> 
> Patch as committed attached. Hope you don't mind the cleanups.
> 
> We still want to reduce the size of the Dwarf_Line_s and struct
> line_state (independent of these extensions). I opened a new bug for
> that: https://sourceware.org/bugzilla/show_bug.cgi?id=28574
> 
> Thanks,
> 
> Mark<0001-libdw-readelf-Read-inlining-info-in-NVIDIA-extended-.patch>