Re: [PATCH] readelf: Print DIE offset in attribute reading error messages.
> "Mark" == Mark Wielaard writes: Mark> This has been really useful to debug some DWARF format issues, but it Mark> could be even more useful. Since we know the attribute name and form Mark> (or we would have generated an error earlier) add those to the error Mark> message too. Thanks for doing this. Does dwarflint also report the error that prompted this? I tried dwarflint on my file, but there was a lot of output and I never did track down the precise problem, so I wasn't sure... Tom
Re: [PATCH] readelf: Decode DW_AT_discr_list block attributes.
> "Mark" == Mark Wielaard writes: Mark> Decode DW_AT_descr_list blocks using the DW_DSC values. Mark> This requires knowing the signedness of the discriminant. Mark> Which means the attr_callback function needs access to the Mark> parent DIE. Pass the whole DIE path, plus the current level. Mark> That way the type of the discriminant can be looked up in Mark> the variant_part (parent) DIE of the variant DIE (which has Mark> the discr_list attribute). Mark> Add a testcase using both signed and unsigned discriminants. Mark> https://sourceware.org/bugzilla/show_bug.cgi?id=24509 Thank you for doing this. Tom
[PATCH] Fix bug in read_3ubyte_unaligned_inc
The read_3ubyte_unaligned_inc macro calls read_2ubyte_unaligned, but it should call read_3ubyte_unaligned. Signed-off-by: Tom Tromey --- libdw/ChangeLog | 5 + libdw/memory-access.h | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/libdw/ChangeLog b/libdw/ChangeLog index 8b0b583a..1ac4e8a9 100644 --- a/libdw/ChangeLog +++ b/libdw/ChangeLog @@ -1,3 +1,8 @@ +2020-10-23 Tom Tromey + + * memory-access.h (read_3ubyte_unaligned_inc): Call + read_3ubyte_unaligned. + 2020-09-03 Mark Wielaard * dwarf.h: Add DW_CFA_AARCH64_negate_ra_state. diff --git a/libdw/memory-access.h b/libdw/memory-access.h index a39ad6d2..14436a71 100644 --- a/libdw/memory-access.h +++ b/libdw/memory-access.h @@ -363,7 +363,7 @@ read_3ubyte_unaligned (Dwarf *dbg, const unsigned char *p) #define read_3ubyte_unaligned_inc(Dbg, Addr) \ - ({ uint32_t t_ = read_2ubyte_unaligned (Dbg, Addr);\ + ({ uint32_t t_ = read_3ubyte_unaligned (Dbg, Addr);\ Addr = (__typeof (Addr)) (((uintptr_t) (Addr)) + 3);\ t_; }) -- 2.17.2
[PATCH] Fix leb128 reading
PR 26773 points out that some sleb128 values are decoded incorrectly. Looking into this, I found some other unusual cases as well. In this patch, I chose to try to handle weird leb128 encodings by preserving their values when possible; or returning the maximum value on overflow. It isn't clear to me that this is necessarily the best choice. --- .gitignore| 1 + ChangeLog | 4 + libdw/ChangeLog | 11 +++ libdw/dwarf_getlocation.c | 2 +- libdw/memory-access.h | 153 +++- tests/ChangeLog | 7 ++ tests/Makefile.am | 8 +- tests/leb128.c| 161 ++ 8 files changed, 272 insertions(+), 75 deletions(-) create mode 100644 tests/leb128.c diff --git a/.gitignore b/.gitignore index c9790941..d737b14d 100644 --- a/.gitignore +++ b/.gitignore @@ -151,6 +151,7 @@ Makefile.in /tests/get-units-invalid /tests/get-units-split /tests/hash +/tests/leb128 /tests/line2addr /tests/low_high_pc /tests/msg_tst diff --git a/ChangeLog b/ChangeLog index 72e8397c..0b50578d 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,7 @@ +2020-10-23 Tom Tromey + + * .gitignore: Add /tests/leb128. + 2020-10-01 Frank Ch. Eigler PR25461 diff --git a/libdw/ChangeLog b/libdw/ChangeLog index 8b0b583a..23081c12 100644 --- a/libdw/ChangeLog +++ b/libdw/ChangeLog @@ -1,3 +1,14 @@ +2020-10-23 Tom Tromey + + PR26773 + * dwarf_getlocation.c (store_implicit_value): Use + __libdw_get_uleb128_unchecked. + * memory-access.h (__libdw_max_len_leb128) + (__libdw_max_len_uleb128, __libdw_max_len_sleb128): Remove. + (__libdw_get_uleb128, __libdw_get_uleb128_unchecked): Rewrite. + (get_sleb128_step): Remove. + (__libdw_get_sleb128, __libdw_get_sleb128_unchecked): Rewrite. + 2020-09-03 Mark Wielaard * dwarf.h: Add DW_CFA_AARCH64_negate_ra_state. diff --git a/libdw/dwarf_getlocation.c b/libdw/dwarf_getlocation.c index 4617f9e9..0ce2e08a 100644 --- a/libdw/dwarf_getlocation.c +++ b/libdw/dwarf_getlocation.c @@ -130,7 +130,7 @@ store_implicit_value (Dwarf *dbg, void **cache, Dwarf_Op *op) struct loc_block_s *block = libdw_alloc (dbg, struct loc_block_s, sizeof (struct loc_block_s), 1); const unsigned char *data = (const unsigned char *) (uintptr_t) op->number2; - uint64_t len = __libdw_get_uleb128 (&data, data + len_leb128 (Dwarf_Word)); + uint64_t len = __libdw_get_uleb128_unchecked (&data); if (unlikely (len != op->number)) return -1; block->addr = op; diff --git a/libdw/memory-access.h b/libdw/memory-access.h index a39ad6d2..70317d81 100644 --- a/libdw/memory-access.h +++ b/libdw/memory-access.h @@ -39,29 +39,6 @@ #define len_leb128(var) ((8 * sizeof (var) + 6) / 7) -static inline size_t -__libdw_max_len_leb128 (const size_t type_len, - const unsigned char *addr, const unsigned char *end) -{ - const size_t pointer_len = likely (addr < end) ? end - addr : 0; - return likely (type_len <= pointer_len) ? type_len : pointer_len; -} - -static inline size_t -__libdw_max_len_uleb128 (const unsigned char *addr, const unsigned char *end) -{ - const size_t type_len = len_leb128 (uint64_t); - return __libdw_max_len_leb128 (type_len, addr, end); -} - -static inline size_t -__libdw_max_len_sleb128 (const unsigned char *addr, const unsigned char *end) -{ - /* Subtract one step, so we don't shift into sign bit. */ - const size_t type_len = len_leb128 (int64_t) - 1; - return __libdw_max_len_leb128 (type_len, addr, end); -} - #define get_uleb128_step(var, addr, nth) \ do { \ unsigned char __b = *(addr)++; \ @@ -79,12 +56,39 @@ __libdw_get_uleb128 (const unsigned char **addrp, const unsigned char *end) for the common single-byte case. */ get_uleb128_step (acc, *addrp, 0); - const size_t max = __libdw_max_len_uleb128 (*addrp - 1, end); - for (size_t i = 1; i < max; ++i) + const size_t max = len_leb128 (acc) - 1; + for (size_t i = 1; i < max && *addrp < end; ++i) get_uleb128_step (acc, *addrp, i); - /* Other implementations set VALUE to UINT_MAX in this - case. So we better do this as well. */ - return UINT64_MAX; + + /* A single step to check the final byte, which could cause + overflow. */ + if (*addrp < end) +{ + const unsigned mask = (1u << (8 * sizeof (acc) - 7 * max)) - 1; + if (unlikely ((**addrp & mask) != 0)) + { + /* Overflow. */ + ++*addrp; + return UINT64_MAX; + } + get_uleb128_step (acc, *addrp, max); +} + + /* Now we can read any number of bytes, as long as the payload is 0. + Anything else would overflow. */ +
Re: [PATCH] Fix bug in read_3ubyte_unaligned_inc
Mark> Clearly DW_FORM_strx3 and DW_FORM_addrx3 aren't used much Mark> (given that it is an index value between larger than 65536). I don't even really see the need for a 3-byte form. Anyway if it helps, gdb gets this wrong too -- gdb always uses little-endian, which seems wrong, but I didn't know how to write a real-life test case. Tom
Re: [PATCH] Fix leb128 reading
>> In this patch, I chose to try to handle weird leb128 encodings by >> preserving their values when possible; or returning the maximum value >> on overflow. It isn't clear to me that this is necessarily the best >> choice. Mark> I think it is a good thing to accept "padded" leb128 numbers, but Mark> probably up to a point. The padding will probably be either mod-4 or Mark> mod-8 bytes. So lets at most read up to 16 bytes total. Now I think all this is just overkill. Actually-existing leb128 relocations (see commit 7d81bc93 in binutils) seem to shrink the representation. There was some talk in the Rust leb128 decoder issues about this topic, but that was a wish-list thing possibly for wasm -- so not exactly a need. >> + const unsigned mask = (1u << (8 * sizeof (acc) - 7 * max)) - 1; Mark> Shouldn't this be a const unsigned char? The C "usual arithmetic conversions" will promote for the "&" anyway. Mark> Is the really just (1u << (8 * 8 - 7 * 9)) - 1 = (1u << 1) - 1 = 1 Mark> So we are really only interested in the lowest bit? Yeah. We could hard-code that if you prefer, but I wrote it this way to pretend that maybe the types in the code would be changed (widened) some day. >> + if (unlikely (value == UINT64_MAX)) >> +return INT64_MAX; Mark> Here you need to help me out a bit, wouldn't UINT64_MAX also be the Mark> representation of -1? Yes. Normally in sleb128 I guess -1 would be written {0x7f}. However one could write it with any number of 0xff prefix bytes first. This weirdness is one reason I changed my mind on handling over-long leb128 sequences. I guess the only solution here is to write 2 copies of each function. Mark> But what I don't fully understand is how this works with a "padded" Mark> leb128, won't we possibly forget to negate in that case? It isn't clear to me what to do when a sleb128 specifies more than 64 bits. Another reason perhaps to avoid this area. Tom
[PATCH v2] Fix leb128 reading
PR 26773 points out that some sleb128 values are decoded incorrectly. This version of the fix only examines the sleb128 conversion. Overlong encodings are not handled, and the uleb128 decoders are not touched. The approach taken here is to do the work in an unsigned type, and then rely on an implementation-defined cast to convert to signed. Signed-off-by: Tom Tromey --- .gitignore| 1 + ChangeLog | 4 ++ libdw/ChangeLog | 10 +++ libdw/dwarf_getlocation.c | 5 +- libdw/memory-access.h | 42 ++-- tests/ChangeLog | 7 ++ tests/Makefile.am | 6 +- tests/leb128.c| 140 ++ 8 files changed, 205 insertions(+), 10 deletions(-) create mode 100644 tests/leb128.c diff --git a/.gitignore b/.gitignore index c9790941..d737b14d 100644 --- a/.gitignore +++ b/.gitignore @@ -151,6 +151,7 @@ Makefile.in /tests/get-units-invalid /tests/get-units-split /tests/hash +/tests/leb128 /tests/line2addr /tests/low_high_pc /tests/msg_tst diff --git a/ChangeLog b/ChangeLog index 72e8397c..4c8699f8 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,7 @@ +2020-10-28 Tom Tromey + + * .gitignore: Add /tests/leb128. + 2020-10-01 Frank Ch. Eigler PR25461 diff --git a/libdw/ChangeLog b/libdw/ChangeLog index 731c7e79..289bb4c9 100644 --- a/libdw/ChangeLog +++ b/libdw/ChangeLog @@ -1,3 +1,13 @@ +2020-10-28 Tom Tromey + + PR26773 + * dwarf_getlocation.c (store_implicit_value): Use + __libdw_get_uleb128_unchecked. + * memory-access.h (get_sleb128_step): Assume unsigned type for + 'var'. + (__libdw_get_sleb128, __libdw_get_sleb128_unchecked): Work in + unsigned type. Handle final byte. + 2020-10-19 Mark Wielaard * dwarf_frame_register.c (dwarf_frame_register): Declare ops_mem diff --git a/libdw/dwarf_getlocation.c b/libdw/dwarf_getlocation.c index 4617f9e9..f2bad5a9 100644 --- a/libdw/dwarf_getlocation.c +++ b/libdw/dwarf_getlocation.c @@ -130,9 +130,8 @@ store_implicit_value (Dwarf *dbg, void **cache, Dwarf_Op *op) struct loc_block_s *block = libdw_alloc (dbg, struct loc_block_s, sizeof (struct loc_block_s), 1); const unsigned char *data = (const unsigned char *) (uintptr_t) op->number2; - uint64_t len = __libdw_get_uleb128 (&data, data + len_leb128 (Dwarf_Word)); - if (unlikely (len != op->number)) -return -1; + /* Skip the block length. */ + __libdw_get_uleb128_unchecked (&data); block->addr = op; block->data = (unsigned char *) data; block->length = op->number; diff --git a/libdw/memory-access.h b/libdw/memory-access.h index 14436a71..8b2386ee 100644 --- a/libdw/memory-access.h +++ b/libdw/memory-access.h @@ -113,19 +113,22 @@ __libdw_get_uleb128_unchecked (const unsigned char **addrp) #define get_sleb128_step(var, addr, nth) \ do { \ unsigned char __b = *(addr)++; \ +(var) |= (typeof (var)) (__b & 0x7f) << ((nth) * 7); \ if (likely ((__b & 0x80) == 0)) \ { \ - struct { signed int i:7; } __s = { .i = __b };\ - (var) |= (typeof (var)) __s.i * ((typeof (var)) 1 << ((nth) * 7));\ + if ((__b & 0x40) != 0)\ + (var) |= - ((typeof (var)) 1 << (((nth) + 1) * 7)); \ return (var); \ } \ -(var) |= (typeof (var)) (__b & 0x7f) << ((nth) * 7); \ } while (0) static inline int64_t __libdw_get_sleb128 (const unsigned char **addrp, const unsigned char *end) { - int64_t acc = 0; + /* Do the work in an unsigned type, but use implementation-defined + behavior to cast to signed on return. This avoids some undefined + behavior when shifting. */ + uint64_t acc = 0; /* Unroll the first step to help the compiler optimize for the common single-byte case. */ @@ -134,6 +137,20 @@ __libdw_get_sleb128 (const unsigned char **addrp, const unsigned char *end) const size_t max = __libdw_max_len_sleb128 (*addrp - 1, end); for (size_t i = 1; i < max; ++i) get_sleb128_step (acc, *addrp, i); + if (*addrp == end) +return INT64_MAX; + + /* There might be one extra byte. */ + unsigned char b = **addrp; + ++*addrp; + if (likely ((b & 0x80) == 0)) +{ + /* We only need the low bit of the final byte, and as it is the +sign bit, we don't need to d
Re: [PATCH] Fix bug in read_3ubyte_unaligned_inc
>>>>> "Mark" == Mark Wielaard writes: Mark> On Sat, Oct 24, 2020 at 03:05:55PM -0600, Tom Tromey wrote: Mark> Clearly DW_FORM_strx3 and DW_FORM_addrx3 aren't used much Mark> (given that it is an index value between larger than 65536). >> >> I don't even really see the need for a 3-byte form. >> Anyway if it helps, gdb gets this wrong too -- gdb always uses >> little-endian, which seems wrong, but I didn't know how to write a >> real-life test case. Mark> How about the attached? It isn't very fancy, but it would have caught Mark> the bug you found. And might be nice in case we ever refactor the Mark> read_unaligned code. Looks reasonable to me. Thanks for doing that. Tom
Re: [PATCH v2] Fix leb128 reading
> "Mark" == Mark Wielaard writes: Mark> Looks good. While reviewing I did try out some extra corner cases that Mark> I added (see attached). It also shows that some values can have Mark> multiple representations. I added them to your commit before I pushed Mark> it. Hope you don't mind. Seems fine, thanks for doing that. Tom
Re: build-ids, .debug_sup and other IDs
> "Mark" == Mark Wielaard writes: >> This patch adds support for this to gdb. It is largely >> straightforward, I think, though one oddity is that I chose not to >> have this code search the system build-id directories for the >> supplementary file. My feeling was that, while it makes sense for a >> distro to unify the build-id concept with the hash stored in the >> .debug_sup section, there's no intrinsic need to do so. Mark> Any opinions on whether we should split these concepts out and introduce Mark> separate request mechanisms per ID-kind, or simply assume a globally Mark> unique id is globally unique and we just clarify what it means to use Mark> a build-id, sup_checksum or dwo_id together with a request for an Mark> executable, debuginfo or source/file? FWIW I looked a little at unifying these. For example, bfdopncls.c:bfd_get_alt_debug_link_info could look at both the build-id and .debug_sup. But, this seemed a bit weird. What if both appear and they are different? Then a single API isn't so great -- you want to check the ID corresponding to whatever was in the original file. Probably I should have stuck some of the new code into BFD though. It's not too late to do that at least. I suppose a distro can ensure that the IDs are always equal. Maybe debuginfod could also just require this. Tom
Re: build-ids, .debug_sup and other IDs
>> But, this seemed a bit weird. What if both appear and they are >> different? Then a single API isn't so great -- you want to check the ID >> corresponding to whatever was in the original file. Frank> If both appear and are different, can we characterize the elf file as Frank> malformed? Not really, nothing specifies that these must be the same. Frank> Or debuginfod could export the content under -both- IDs, if there were Frank> two valid candidates, and just go with the flow. Let the clients Frank> choose which ID they prefer to look up by. There's a namespace problem here. You could, in theory, have executable A with build id , and also executable B with debug_sup id also . This could be fixed with some kind of query parameter. It would be easy on the gdb side to supply this information. Tom
Re: build-ids, .debug_sup and other IDs
Frank> (Does dwz'd dwarf5 even work on gdb Frank> etc. now?) It doesn't, this thread started because I sent a patch to change gdb to read .debug_sup. This hasn't landed yet. Tom