Re: [PATCH] readelf: Print DIE offset in attribute reading error messages.

2017-11-21 Thread Tom Tromey
> "Mark" == Mark Wielaard  writes:

Mark> This has been really useful to debug some DWARF format issues, but it
Mark> could be even more useful. Since we know the attribute name and form
Mark> (or we would have generated an error earlier) add those to the error
Mark> message too.

Thanks for doing this.

Does dwarflint also report the error that prompted this?
I tried dwarflint on my file, but there was a lot of output and I never
did track down the precise problem, so I wasn't sure...

Tom


Re: [PATCH] readelf: Decode DW_AT_discr_list block attributes.

2019-05-10 Thread Tom Tromey
> "Mark" == Mark Wielaard  writes:

Mark> Decode DW_AT_descr_list blocks using the DW_DSC values.
Mark> This requires knowing the signedness of the discriminant.
Mark> Which means the attr_callback function needs access to the
Mark> parent DIE. Pass the whole DIE path, plus the current level.
Mark> That way the type of the discriminant can be looked up in
Mark> the variant_part (parent) DIE of the variant DIE (which has
Mark> the discr_list attribute).

Mark> Add a testcase using both signed and unsigned discriminants.

Mark> https://sourceware.org/bugzilla/show_bug.cgi?id=24509

Thank you for doing this.

Tom


[PATCH] Fix bug in read_3ubyte_unaligned_inc

2020-10-23 Thread Tom Tromey
The read_3ubyte_unaligned_inc macro calls read_2ubyte_unaligned, but
it should call read_3ubyte_unaligned.

Signed-off-by: Tom Tromey 
---
 libdw/ChangeLog   | 5 +
 libdw/memory-access.h | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/libdw/ChangeLog b/libdw/ChangeLog
index 8b0b583a..1ac4e8a9 100644
--- a/libdw/ChangeLog
+++ b/libdw/ChangeLog
@@ -1,3 +1,8 @@
+2020-10-23  Tom Tromey  
+
+   * memory-access.h (read_3ubyte_unaligned_inc): Call
+   read_3ubyte_unaligned.
+
 2020-09-03  Mark Wielaard  
 
* dwarf.h: Add DW_CFA_AARCH64_negate_ra_state.
diff --git a/libdw/memory-access.h b/libdw/memory-access.h
index a39ad6d2..14436a71 100644
--- a/libdw/memory-access.h
+++ b/libdw/memory-access.h
@@ -363,7 +363,7 @@ read_3ubyte_unaligned (Dwarf *dbg, const unsigned char *p)
 
 
 #define read_3ubyte_unaligned_inc(Dbg, Addr) \
-  ({ uint32_t t_ = read_2ubyte_unaligned (Dbg, Addr);\
+  ({ uint32_t t_ = read_3ubyte_unaligned (Dbg, Addr);\
  Addr = (__typeof (Addr)) (((uintptr_t) (Addr)) + 3);\
  t_; })
 
-- 
2.17.2



[PATCH] Fix leb128 reading

2020-10-23 Thread Tom Tromey
PR 26773 points out that some sleb128 values are decoded incorrectly.
Looking into this, I found some other unusual cases as well.

In this patch, I chose to try to handle weird leb128 encodings by
preserving their values when possible; or returning the maximum value
on overflow.  It isn't clear to me that this is necessarily the best
choice.
---
 .gitignore|   1 +
 ChangeLog |   4 +
 libdw/ChangeLog   |  11 +++
 libdw/dwarf_getlocation.c |   2 +-
 libdw/memory-access.h | 153 +++-
 tests/ChangeLog   |   7 ++
 tests/Makefile.am |   8 +-
 tests/leb128.c| 161 ++
 8 files changed, 272 insertions(+), 75 deletions(-)
 create mode 100644 tests/leb128.c

diff --git a/.gitignore b/.gitignore
index c9790941..d737b14d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -151,6 +151,7 @@ Makefile.in
 /tests/get-units-invalid
 /tests/get-units-split
 /tests/hash
+/tests/leb128
 /tests/line2addr
 /tests/low_high_pc
 /tests/msg_tst
diff --git a/ChangeLog b/ChangeLog
index 72e8397c..0b50578d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2020-10-23  Tom Tromey  
+
+   * .gitignore: Add /tests/leb128.
+
 2020-10-01  Frank Ch. Eigler  
 
PR25461
diff --git a/libdw/ChangeLog b/libdw/ChangeLog
index 8b0b583a..23081c12 100644
--- a/libdw/ChangeLog
+++ b/libdw/ChangeLog
@@ -1,3 +1,14 @@
+2020-10-23  Tom Tromey  
+
+   PR26773
+   * dwarf_getlocation.c (store_implicit_value): Use
+   __libdw_get_uleb128_unchecked.
+   * memory-access.h (__libdw_max_len_leb128)
+   (__libdw_max_len_uleb128, __libdw_max_len_sleb128): Remove.
+   (__libdw_get_uleb128, __libdw_get_uleb128_unchecked): Rewrite.
+   (get_sleb128_step): Remove.
+   (__libdw_get_sleb128, __libdw_get_sleb128_unchecked): Rewrite.
+
 2020-09-03  Mark Wielaard  
 
* dwarf.h: Add DW_CFA_AARCH64_negate_ra_state.
diff --git a/libdw/dwarf_getlocation.c b/libdw/dwarf_getlocation.c
index 4617f9e9..0ce2e08a 100644
--- a/libdw/dwarf_getlocation.c
+++ b/libdw/dwarf_getlocation.c
@@ -130,7 +130,7 @@ store_implicit_value (Dwarf *dbg, void **cache, Dwarf_Op 
*op)
   struct loc_block_s *block = libdw_alloc (dbg, struct loc_block_s,
   sizeof (struct loc_block_s), 1);
   const unsigned char *data = (const unsigned char *) (uintptr_t) op->number2;
-  uint64_t len = __libdw_get_uleb128 (&data, data + len_leb128 (Dwarf_Word));
+  uint64_t len = __libdw_get_uleb128_unchecked (&data);
   if (unlikely (len != op->number))
 return -1;
   block->addr = op;
diff --git a/libdw/memory-access.h b/libdw/memory-access.h
index a39ad6d2..70317d81 100644
--- a/libdw/memory-access.h
+++ b/libdw/memory-access.h
@@ -39,29 +39,6 @@
 
 #define len_leb128(var) ((8 * sizeof (var) + 6) / 7)
 
-static inline size_t
-__libdw_max_len_leb128 (const size_t type_len,
-   const unsigned char *addr, const unsigned char *end)
-{
-  const size_t pointer_len = likely (addr < end) ? end - addr : 0;
-  return likely (type_len <= pointer_len) ? type_len : pointer_len;
-}
-
-static inline size_t
-__libdw_max_len_uleb128 (const unsigned char *addr, const unsigned char *end)
-{
-  const size_t type_len = len_leb128 (uint64_t);
-  return __libdw_max_len_leb128 (type_len, addr, end);
-}
-
-static inline size_t
-__libdw_max_len_sleb128 (const unsigned char *addr, const unsigned char *end)
-{
-  /* Subtract one step, so we don't shift into sign bit.  */
-  const size_t type_len = len_leb128 (int64_t) - 1;
-  return __libdw_max_len_leb128 (type_len, addr, end);
-}
-
 #define get_uleb128_step(var, addr, nth) \
   do {   \
 unsigned char __b = *(addr)++;   \
@@ -79,12 +56,39 @@ __libdw_get_uleb128 (const unsigned char **addrp, const 
unsigned char *end)
  for the common single-byte case.  */
   get_uleb128_step (acc, *addrp, 0);
 
-  const size_t max = __libdw_max_len_uleb128 (*addrp - 1, end);
-  for (size_t i = 1; i < max; ++i)
+  const size_t max = len_leb128 (acc) - 1;
+  for (size_t i = 1; i < max && *addrp < end; ++i)
 get_uleb128_step (acc, *addrp, i);
-  /* Other implementations set VALUE to UINT_MAX in this
- case.  So we better do this as well.  */
-  return UINT64_MAX;
+
+  /* A single step to check the final byte, which could cause
+ overflow.  */
+  if (*addrp < end)
+{
+  const unsigned mask = (1u << (8 * sizeof (acc) - 7 * max)) - 1;
+  if (unlikely ((**addrp & mask) != 0))
+   {
+ /* Overflow.  */
+ ++*addrp;
+ return UINT64_MAX;
+   }
+  get_uleb128_step (acc, *addrp, max);
+}
+
+  /* Now we can read any number of bytes, as long as the payload is 0.
+ Anything else would overflow.  */
+

Re: [PATCH] Fix bug in read_3ubyte_unaligned_inc

2020-10-24 Thread Tom Tromey
Mark> Clearly DW_FORM_strx3 and DW_FORM_addrx3 aren't used much
Mark> (given that it is an index value between larger than 65536).

I don't even really see the need for a 3-byte form.
Anyway if it helps, gdb gets this wrong too -- gdb always uses
little-endian, which seems wrong, but I didn't know how to write a
real-life test case.

Tom


Re: [PATCH] Fix leb128 reading

2020-10-26 Thread Tom Tromey
>> In this patch, I chose to try to handle weird leb128 encodings by
>> preserving their values when possible; or returning the maximum value
>> on overflow.  It isn't clear to me that this is necessarily the best
>> choice.

Mark> I think it is a good thing to accept "padded" leb128 numbers, but
Mark> probably up to a point. The padding will probably be either mod-4 or
Mark> mod-8 bytes. So lets at most read up to 16 bytes total.

Now I think all this is just overkill.  Actually-existing leb128
relocations (see commit 7d81bc93 in binutils) seem to shrink the
representation.  There was some talk in the Rust leb128 decoder issues
about this topic, but that was a wish-list thing possibly for wasm -- so
not exactly a need.

>> +  const unsigned mask = (1u << (8 * sizeof (acc) - 7 * max)) - 1;

Mark> Shouldn't this be a const unsigned char?

The C "usual arithmetic conversions" will promote for the "&" anyway.

Mark> Is the really just (1u << (8 * 8 - 7 * 9)) - 1 = (1u << 1) - 1 = 1
Mark> So we are really only interested in the lowest bit?

Yeah.  We could hard-code that if you prefer, but I wrote it this way to
pretend that maybe the types in the code would be changed (widened) some
day.

>> +  if (unlikely (value == UINT64_MAX))
>> +return INT64_MAX;

Mark> Here you need to help me out a bit, wouldn't UINT64_MAX also be the
Mark> representation of -1?

Yes.  Normally in sleb128 I guess -1 would be written {0x7f}.  However
one could write it with any number of 0xff prefix bytes first.  This
weirdness is one reason I changed my mind on handling over-long leb128
sequences.

I guess the only solution here is to write 2 copies of each function.

Mark> But what I don't fully understand is how this works with a "padded"
Mark> leb128, won't we possibly forget to negate in that case?

It isn't clear to me what to do when a sleb128 specifies more than 64
bits.  Another reason perhaps to avoid this area.

Tom


[PATCH v2] Fix leb128 reading

2020-10-28 Thread Tom Tromey
PR 26773 points out that some sleb128 values are decoded incorrectly.

This version of the fix only examines the sleb128 conversion.
Overlong encodings are not handled, and the uleb128 decoders are not
touched.  The approach taken here is to do the work in an unsigned
type, and then rely on an implementation-defined cast to convert to
signed.

Signed-off-by: Tom Tromey 
---
 .gitignore|   1 +
 ChangeLog |   4 ++
 libdw/ChangeLog   |  10 +++
 libdw/dwarf_getlocation.c |   5 +-
 libdw/memory-access.h |  42 ++--
 tests/ChangeLog   |   7 ++
 tests/Makefile.am |   6 +-
 tests/leb128.c| 140 ++
 8 files changed, 205 insertions(+), 10 deletions(-)
 create mode 100644 tests/leb128.c

diff --git a/.gitignore b/.gitignore
index c9790941..d737b14d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -151,6 +151,7 @@ Makefile.in
 /tests/get-units-invalid
 /tests/get-units-split
 /tests/hash
+/tests/leb128
 /tests/line2addr
 /tests/low_high_pc
 /tests/msg_tst
diff --git a/ChangeLog b/ChangeLog
index 72e8397c..4c8699f8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2020-10-28  Tom Tromey  
+
+   * .gitignore: Add /tests/leb128.
+
 2020-10-01  Frank Ch. Eigler  
 
PR25461
diff --git a/libdw/ChangeLog b/libdw/ChangeLog
index 731c7e79..289bb4c9 100644
--- a/libdw/ChangeLog
+++ b/libdw/ChangeLog
@@ -1,3 +1,13 @@
+2020-10-28  Tom Tromey  
+
+   PR26773
+   * dwarf_getlocation.c (store_implicit_value): Use
+   __libdw_get_uleb128_unchecked.
+   * memory-access.h (get_sleb128_step): Assume unsigned type for
+   'var'.
+   (__libdw_get_sleb128, __libdw_get_sleb128_unchecked): Work in
+   unsigned type.  Handle final byte.
+
 2020-10-19  Mark Wielaard  
 
* dwarf_frame_register.c (dwarf_frame_register): Declare ops_mem
diff --git a/libdw/dwarf_getlocation.c b/libdw/dwarf_getlocation.c
index 4617f9e9..f2bad5a9 100644
--- a/libdw/dwarf_getlocation.c
+++ b/libdw/dwarf_getlocation.c
@@ -130,9 +130,8 @@ store_implicit_value (Dwarf *dbg, void **cache, Dwarf_Op 
*op)
   struct loc_block_s *block = libdw_alloc (dbg, struct loc_block_s,
   sizeof (struct loc_block_s), 1);
   const unsigned char *data = (const unsigned char *) (uintptr_t) op->number2;
-  uint64_t len = __libdw_get_uleb128 (&data, data + len_leb128 (Dwarf_Word));
-  if (unlikely (len != op->number))
-return -1;
+  /* Skip the block length.  */
+  __libdw_get_uleb128_unchecked (&data);
   block->addr = op;
   block->data = (unsigned char *) data;
   block->length = op->number;
diff --git a/libdw/memory-access.h b/libdw/memory-access.h
index 14436a71..8b2386ee 100644
--- a/libdw/memory-access.h
+++ b/libdw/memory-access.h
@@ -113,19 +113,22 @@ __libdw_get_uleb128_unchecked (const unsigned char 
**addrp)
 #define get_sleb128_step(var, addr, nth) \
   do {   \
 unsigned char __b = *(addr)++;   \
+(var) |= (typeof (var)) (__b & 0x7f) << ((nth) * 7); \
 if (likely ((__b & 0x80) == 0))  \
   {
  \
-   struct { signed int i:7; } __s = { .i = __b };\
-   (var) |= (typeof (var)) __s.i * ((typeof (var)) 1 << ((nth) * 7));\
+   if ((__b & 0x40) != 0)\
+ (var) |= - ((typeof (var)) 1 << (((nth) + 1) * 7)); \
return (var); \
   }
  \
-(var) |= (typeof (var)) (__b & 0x7f) << ((nth) * 7); \
   } while (0)
 
 static inline int64_t
 __libdw_get_sleb128 (const unsigned char **addrp, const unsigned char *end)
 {
-  int64_t acc = 0;
+  /* Do the work in an unsigned type, but use implementation-defined
+ behavior to cast to signed on return.  This avoids some undefined
+ behavior when shifting.  */
+  uint64_t acc = 0;
 
   /* Unroll the first step to help the compiler optimize
  for the common single-byte case.  */
@@ -134,6 +137,20 @@ __libdw_get_sleb128 (const unsigned char **addrp, const 
unsigned char *end)
   const size_t max = __libdw_max_len_sleb128 (*addrp - 1, end);
   for (size_t i = 1; i < max; ++i)
 get_sleb128_step (acc, *addrp, i);
+  if (*addrp == end)
+return INT64_MAX;
+
+  /* There might be one extra byte.  */
+  unsigned char b = **addrp;
+  ++*addrp;
+  if (likely ((b & 0x80) == 0))
+{
+  /* We only need the low bit of the final byte, and as it is the
+sign bit, we don't need to d

Re: [PATCH] Fix bug in read_3ubyte_unaligned_inc

2020-10-29 Thread Tom Tromey
>>>>> "Mark" == Mark Wielaard  writes:

Mark> On Sat, Oct 24, 2020 at 03:05:55PM -0600, Tom Tromey wrote:
Mark> Clearly DW_FORM_strx3 and DW_FORM_addrx3 aren't used much
Mark> (given that it is an index value between larger than 65536).
>> 
>> I don't even really see the need for a 3-byte form.
>> Anyway if it helps, gdb gets this wrong too -- gdb always uses
>> little-endian, which seems wrong, but I didn't know how to write a
>> real-life test case.

Mark> How about the attached? It isn't very fancy, but it would have caught
Mark> the bug you found. And might be nice in case we ever refactor the
Mark> read_unaligned code.

Looks reasonable to me.  Thanks for doing that.

Tom


Re: [PATCH v2] Fix leb128 reading

2020-10-30 Thread Tom Tromey
> "Mark" == Mark Wielaard  writes:

Mark> Looks good. While reviewing I did try out some extra corner cases that
Mark> I added (see attached). It also shows that some values can have
Mark> multiple representations. I added them to your commit before I pushed
Mark> it. Hope you don't mind.

Seems fine, thanks for doing that.

Tom


Re: build-ids, .debug_sup and other IDs

2021-02-24 Thread Tom Tromey
> "Mark" == Mark Wielaard  writes:

>> This patch adds support for this to gdb.  It is largely
>> straightforward, I think, though one oddity is that I chose not to
>> have this code search the system build-id directories for the
>> supplementary file.  My feeling was that, while it makes sense for a
>> distro to unify the build-id concept with the hash stored in the
>> .debug_sup section, there's no intrinsic need to do so.

Mark> Any opinions on whether we should split these concepts out and introduce
Mark> separate request mechanisms per ID-kind, or simply assume a globally
Mark> unique id is globally unique and we just clarify what it means to use
Mark> a build-id, sup_checksum or dwo_id together with a request for an
Mark> executable, debuginfo or source/file?

FWIW I looked a little at unifying these.  For example,
bfdopncls.c:bfd_get_alt_debug_link_info could look at both the build-id
and .debug_sup.

But, this seemed a bit weird.  What if both appear and they are
different?  Then a single API isn't so great -- you want to check the ID
corresponding to whatever was in the original file.

Probably I should have stuck some of the new code into BFD though.
It's not too late to do that at least.

I suppose a distro can ensure that the IDs are always equal.  Maybe
debuginfod could also just require this.

Tom


Re: build-ids, .debug_sup and other IDs

2021-03-02 Thread Tom Tromey
>> But, this seemed a bit weird.  What if both appear and they are
>> different?  Then a single API isn't so great -- you want to check the ID
>> corresponding to whatever was in the original file.

Frank> If both appear and are different, can we characterize the elf file as
Frank> malformed?

Not really, nothing specifies that these must be the same.

Frank> Or debuginfod could export the content under -both- IDs, if there were
Frank> two valid candidates, and just go with the flow.  Let the clients
Frank> choose which ID they prefer to look up by.

There's a namespace problem here.  You could, in theory, have executable
A with build id , and also executable B with debug_sup id also .
This could be fixed with some kind of query parameter.  It would be easy
on the gdb side to supply this information.

Tom


Re: build-ids, .debug_sup and other IDs

2021-03-02 Thread Tom Tromey
Frank> (Does dwz'd dwarf5 even work on gdb
Frank> etc. now?)

It doesn't, this thread started because I sent a patch to change gdb to
read .debug_sup.  This hasn't landed yet.

Tom