[PATCH 1/2] libdw: New get_uleb128_unchecked to use with already checked Dwarf_Abbrev.

2017-12-26 Thread Mark Wielaard
When creating a Dwarf_Abbrev in dwarf_getabbrev (__libdw_getabbrev) we
already check it is fully readable from the .debug_abbrev section. So
whenever we reread it later using the attrp pointer we don't have to
check it again. Introduce get_uleb128_unchecked to use for ulebs we
know are safe to read directly.

Signed-off-by: Mark Wielaard 
---
 libdw/ChangeLog | 10 ++
 libdw/dwarf_child.c | 19 +--
 libdw/dwarf_getabbrevattr.c | 10 +-
 libdw/dwarf_getattrs.c  | 20 +---
 libdw/dwarf_hasattr.c   | 22 +-
 libdw/memory-access.h   | 18 ++
 6 files changed, 48 insertions(+), 51 deletions(-)

diff --git a/libdw/ChangeLog b/libdw/ChangeLog
index eb1cb709..3e3b7169 100644
--- a/libdw/ChangeLog
+++ b/libdw/ChangeLog
@@ -1,3 +1,13 @@
+2017-12-26  Mark Wielaard  
+
+   * memory-access.h (__libdw_get_uleb128_unchecked): New function.
+   (get_uleb128_unchecked): New define.
+   * dwarf_child.c (__libdw_find_attr): Use get_uleb128_unchecked to
+   read attr name and form.
+   * dwarf_getabbrevattr.c (dwarf_getabbrevattr): Likewise.
+   * dwarf_getattrs.c (dwarf_getattrs): Likewise.
+   * dwarf_hasattr.c (dwarf_hasattr): Likewise.
+
 2017-05-09  Ulf Hermann  
Mark Wielaard  
 
diff --git a/libdw/dwarf_child.c b/libdw/dwarf_child.c
index cc95fb3f..248338eb 100644
--- a/libdw/dwarf_child.c
+++ b/libdw/dwarf_child.c
@@ -1,5 +1,5 @@
 /* Return child of current DIE.
-   Copyright (C) 2003-2011, 2014 Red Hat, Inc.
+   Copyright (C) 2003-2011, 2014, 2017 Red Hat, Inc.
This file is part of elfutils.
Written by Ulrich Drepper , 2003.
 
@@ -43,36 +43,27 @@ internal_function
 __libdw_find_attr (Dwarf_Die *die, unsigned int search_name,
   unsigned int *codep, unsigned int *formp)
 {
-  Dwarf *dbg = die->cu->dbg;
   const unsigned char *readp;
 
   /* Find the abbreviation entry.  */
   Dwarf_Abbrev *abbrevp = __libdw_dieabbrev (die, &readp);
   if (unlikely (abbrevp == DWARF_END_ABBREV))
 {
-invalid_dwarf:
   __libdw_seterrno (DWARF_E_INVALID_DWARF);
   return NULL;
 }
 
-  /* Search the name attribute.  */
-  unsigned char *const endp
-= ((unsigned char *) dbg->sectiondata[IDX_debug_abbrev]->d_buf
-   + dbg->sectiondata[IDX_debug_abbrev]->d_size);
-
+  /* Search the name attribute.  Attribute has been checked when
+ Dwarf_Abbrev was created, we can read unchecked.  */
   const unsigned char *attrp = abbrevp->attrp;
   while (1)
 {
   /* Get attribute name and form.  */
-  if (unlikely (attrp >= endp))
-   goto invalid_dwarf;
   unsigned int attr_name;
-  get_uleb128 (attr_name, attrp, endp);
+  get_uleb128_unchecked (attr_name, attrp);
 
-  if (unlikely (attrp >= endp))
-   goto invalid_dwarf;
   unsigned int attr_form;
-  get_uleb128 (attr_form, attrp, endp);
+  get_uleb128_unchecked (attr_form, attrp);
 
   /* We can stop if we found the attribute with value zero.  */
   if (attr_name == 0 && attr_form == 0)
diff --git a/libdw/dwarf_getabbrevattr.c b/libdw/dwarf_getabbrevattr.c
index 3b4da99c..57fe3637 100644
--- a/libdw/dwarf_getabbrevattr.c
+++ b/libdw/dwarf_getabbrevattr.c
@@ -1,5 +1,5 @@
 /* Get specific attribute of abbreviation.
-   Copyright (C) 2003, 2004, 2005, 2014 Red Hat, Inc.
+   Copyright (C) 2003, 2004, 2005, 2014, 2017 Red Hat, Inc.
This file is part of elfutils.
Written by Ulrich Drepper , 2003.
 
@@ -53,10 +53,10 @@ dwarf_getabbrevattr (Dwarf_Abbrev *abbrev, size_t idx, 
unsigned int *namep,
 {
   start_attrp = attrp;
 
-  /* Attribute code and form are encoded as ULEB128 values.i
- XXX We have no way to bounds check.  */
-  get_uleb128 (name, attrp, attrp + len_leb128 (name));
-  get_uleb128 (form, attrp, attrp + len_leb128 (form));
+  /* Attribute code and form are encoded as ULEB128 values.
+ Already checked when Dwarf_Abbrev was created, read unchecked.  */
+  get_uleb128_unchecked (name, attrp);
+  get_uleb128_unchecked (form, attrp);
 
   /* If both values are zero the index is out of range.  */
   if (name == 0 && form == 0)
diff --git a/libdw/dwarf_getattrs.c b/libdw/dwarf_getattrs.c
index 0da8b5ba..7f55fafc 100644
--- a/libdw/dwarf_getattrs.c
+++ b/libdw/dwarf_getattrs.c
@@ -1,5 +1,5 @@
 /* Get attributes of the DIE.
-   Copyright (C) 2004, 2005, 2008, 2009, 2014 Red Hat, Inc.
+   Copyright (C) 2004, 2005, 2008, 2009, 2014, 2017 Red Hat, Inc.
This file is part of elfutils.
Written by Ulrich Drepper , 2004.
 
@@ -51,7 +51,6 @@ dwarf_getattrs (Dwarf_Die *die, int (*callback) 
(Dwarf_Attribute *, void *),
 
   if (unlikely (abbrevp == DWARF_END_ABBREV))
 {
-invalid_dwarf:
   __libdw_seterrno (DWARF_E_INVALID_DWARF);
   return -1l;
 }
@@ -61,24 +60,15 @@ dwarf_getattrs (Dwarf_Die *die, int (*callback) 
(Dwarf_Attribute *, void *),
   const unsigned char *

[PATCH 2/2] libdw: Reduce size of struct Dwarf_Abbrev.

2017-12-26 Thread Mark Wielaard
If we don't cache the attrcnt and use bitfields for the has_children and
code we can reduce the size of struct Dwarf Abbrev from 32 to 24 bytes on
64bit architectures and from 28 to 20 bytes on 32bit arches.

Signed-off-by: Mark Wielaard 
---
 libdw/ChangeLog  |  7 +++
 libdw/dwarf_getabbrev.c  |  7 +++
 libdw/dwarf_getattrcnt.c | 19 +--
 libdw/libdwP.h   | 15 +++
 4 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/libdw/ChangeLog b/libdw/ChangeLog
index 3e3b7169..c284dbbd 100644
--- a/libdw/ChangeLog
+++ b/libdw/ChangeLog
@@ -1,3 +1,10 @@
+2017-12-26  Mark Wielaard  
+
+   * libdwP.h (struct Dwarf_Abbrev): Pack struct. Remove attrcnt,
+   use bitfields for has_children and code.
+   * dwarf_getabbrev.c (__libdw_getabbrev): Don't count attrs.
+   * dwarf_getattrcnt.c (dwarf_getattrcnt): Count attrs.
+
 2017-12-26  Mark Wielaard  
 
* memory-access.h (__libdw_get_uleb128_unchecked): New function.
diff --git a/libdw/dwarf_getabbrev.c b/libdw/dwarf_getabbrev.c
index ef51b847..a3a68b37 100644
--- a/libdw/dwarf_getabbrev.c
+++ b/libdw/dwarf_getabbrev.c
@@ -1,5 +1,5 @@
 /* Get abbreviation at given offset.
-   Copyright (C) 2003, 2004, 2005, 2006, 2014 Red Hat, Inc.
+   Copyright (C) 2003, 2004, 2005, 2006, 2014, 2017 Red Hat, Inc.
This file is part of elfutils.
Written by Ulrich Drepper , 2003.
 
@@ -121,8 +121,7 @@ __libdw_getabbrev (Dwarf *dbg, struct Dwarf_CU *cu, 
Dwarf_Off offset,
   abb->attrp = (unsigned char *) abbrevp;
   abb->offset = offset;
 
-  /* Skip over all the attributes and count them while doing so.  */
-  abb->attrcnt = 0;
+  /* Skip over all the attributes and check rest of the abbrev is valid.  */
   unsigned int attrname;
   unsigned int attrform;
   do
@@ -134,7 +133,7 @@ __libdw_getabbrev (Dwarf *dbg, struct Dwarf_CU *cu, 
Dwarf_Off offset,
goto invalid;
   get_uleb128 (attrform, abbrevp, end);
 }
-  while (attrname != 0 && attrform != 0 && ++abb->attrcnt);
+  while (attrname != 0 && attrform != 0);
 
   /* Return the length to the caller if she asked for it.  */
   if (lengthp != NULL)
diff --git a/libdw/dwarf_getattrcnt.c b/libdw/dwarf_getattrcnt.c
index 2bfb4ac5..a05976d4 100644
--- a/libdw/dwarf_getattrcnt.c
+++ b/libdw/dwarf_getattrcnt.c
@@ -1,5 +1,5 @@
 /* Get number of attributes of abbreviation.
-   Copyright (C) 2003, 2004 Red Hat, Inc.
+   Copyright (C) 2003, 2004, 2017 Red Hat, Inc.
This file is part of elfutils.
Written by Ulrich Drepper , 2003.
 
@@ -40,7 +40,22 @@ dwarf_getattrcnt (Dwarf_Abbrev *abbrev, size_t *attrcntp)
   if (abbrev == NULL)
 return -1;
 
-  *attrcntp = abbrev->attrcnt;
+  const unsigned char *abbrevp = abbrev->attrp;
+
+  /* Skip over all the attributes and count them while doing so.  */
+  int attrcnt = 0;
+  unsigned int attrname;
+  unsigned int attrform;
+  do
+{
+  /* We can use unchecked since they were checked when the Dwrf_Abbrev
+was created.  */
+  get_uleb128_unchecked (attrname, abbrevp);
+  get_uleb128_unchecked (attrform, abbrevp);
+}
+  while (attrname != 0 && attrform != 0 && ++attrcnt);
+
+  *attrcntp = attrcnt;
 
   return 0;
 }
diff --git a/libdw/libdwP.h b/libdw/libdwP.h
index 82b47d09..fbb8ed08 100644
--- a/libdw/libdwP.h
+++ b/libdw/libdwP.h
@@ -1,5 +1,5 @@
 /* Internal definitions for libdwarf.
-   Copyright (C) 2002-2011, 2013-2016 Red Hat, Inc.
+   Copyright (C) 2002-2011, 2013-2017 Red Hat, Inc.
This file is part of elfutils.
Written by Ulrich Drepper , 2002.
 
@@ -212,13 +212,12 @@ struct Dwarf
 /* Abbreviation representation.  */
 struct Dwarf_Abbrev
 {
-  Dwarf_Off offset;
-  unsigned char *attrp;
-  unsigned int attrcnt;
-  unsigned int code;
-  unsigned int tag;
-  bool has_children;
-};
+  Dwarf_Off offset;  /* Offset to start of abbrev into .debug_abbrev.  */
+  unsigned char *attrp;   /* Pointer to start of attribute name/form pairs. */
+  bool has_children : 1;  /* Whether or not the DIE has children. */
+  unsigned int code : 31; /* The (unique) abbrev code.  */
+  unsigned int tag;  /* The tag of the DIE. */
+} attribute_packed;
 
 #include "dwarf_abbrev_hash.h"
 
-- 
2.14.3



Simpler abbrev parsing using less memory

2017-12-26 Thread Mark Wielaard
Hi,

When we added bounds checking to almost all data reading functions
(commit 7a05347 libdw: Add get_uleb128 and get_sleb128 bounds checking)
we also added extra checks to the abbrev reading. But since we didn't
really have bounds for the "raw" Dwarf_Abbrev reading functions we
just "guessed" the maximum of a uleb128. This wasn't really correct
and not really needed. A struct Dwarf_Abbrev can only be created by
__libdw_getabbrev, which checks the whole abbrev (code, tag, children
and attribute names/forms) is valid already. So whenever we use the
attrp pointer from the Dwarf_Abbrev to read the name/forms we already
know they are in the .debug_abbrev bounds).

[PATCH 1/2] libdw: New get_uleb128_unchecked to use with already
   checked Dwarf_Abbrev.

So the first patch introduces a get_uleb128_unchecked function that
is used for re-reading such uleb128 values.

The second patch reduces the size of the struct Dwarf_Abbrev by not
storing the attrcnt and by using bitfields for has_children and code.

[PATCH 2/2] libdw: Reduce size of struct Dwarf_Abbrev.

The attrcnt was only used by the dwarf_getattrcnt function. Which
is only used in one testcase. And which seems mostly unnecessary
for real programs. The function now explicitly counts the attrs
instead of using a cached value.

The combined patches very slightly reduces the time for parsing
abbrevs and make the abbrev cache somewhat smaller.

On my machine eu-readelf -N --debug-dump=info libstdc++.so.debug
goes down from 1.79 to 1.71 secs. And max rss goes down from 15.296
to 14.684 kbytes.

Cheers,

Mark