Hi Aaron,
On Tue, 2025-04-15 at 17:04 -0400, Aaron Merey wrote:
> eu_tfind is used to facilitate lazy loading throughout libdw.
> If a result is not found via eu_tfind, work is done to load
> the result and cache it in an eu_search_tree.
>
> Some calls to eu_tfind allow for TOCTOU bugs. Multiple threads
> might race to call eu_tfind on some result that hasn't yet been
> cached. Multiple threads may then attempt to load the result
> which may cause an unnecessary amount of memory may be allocated.
> Additionally this memory may not get released when the associated
> libdw data structure is freed.
>
> Fix this by adding additional locking to ensure that only one
> thread performs lazy loading.
>
> One approach used in this patch is to preserve calls to eu_tfind
> without additional locking, but when the result isn't found then
> a lock is then used to synchronize access to the lazy loading code.
> An extra eu_tfind call has been added at the start of these critical
> section to synchronize verification that lazy loading should proceed.
>
> Another approach used is to simply synchronize entire calls to
> functions where lazy loading via eu_tfind might occur (__libdw_find_fde
> and __libdw_intern_expression).
>
> libdw/
> * cfi.h (struct Dwarf_CFI_s): Declare new mutex.
> * dwarf_begin_elf.c (valid_p): Initialize all locks for fake CUs.
> * dwarf_cfi_addrframe.c (dwarf_cfi_addrframe): Place lock around
> __libdw_find_fde.
> * dwarf_end.c (cu_free): Deallocate all locks unconditionally,
> whether or not the CU is fake.
> * dwarf_frame_cfa.c (dwarf_frame_cfa): Place lock around
> __libdw_intern_expression.
> * dwarf_frame_register.c (dwarf_frame_register): Ditto.
> * dwarf_getcfi.c (dwarf_getcfi): Initialize cfi lock.
> * dwarf_getlocation.c (is_constant_offset): Synchronize access
> to lazy loading section.
> (getlocation): Place lock around __libdw_intern_expression.
> * dwarf_getmacros.c (cache_op_table): Synchronize access to lazy
> loading section.
> * frame-cache.c (__libdw_destroy_frame_cache): Free Dwarf_CFI
> mutex.
> * libdwP.h (struct Dwarf): Update macro_lock comment.
> (struct Dwarf_CU): Declare new mutex.
> libdw_findcu.c (__libdw_intern_next_unit): Initialize
> intern_lock.
> (__libdw_findcu): Adjust locking so that the first eu_tfind
> can be done without extra lock overhead.
>
> Signed-off-by: Aaron Merey <[email protected]>
> ---
> libdw/cfi.h | 4 +++
> libdw/dwarf_begin_elf.c | 15 +++++++++
> libdw/dwarf_cfi_addrframe.c | 3 ++
> libdw/dwarf_end.c | 10 +++---
> libdw/dwarf_frame_cfa.c | 2 ++
> libdw/dwarf_frame_register.c | 16 +++++----
> libdw/dwarf_getcfi.c | 1 +
> libdw/dwarf_getlocation.c | 63 ++++++++++++++++++++++--------------
> libdw/dwarf_getmacros.c | 16 ++++++++-
> libdw/frame-cache.c | 1 +
> libdw/libdwP.h | 6 +++-
> libdw/libdw_findcu.c | 9 ++++--
> 12 files changed, 108 insertions(+), 38 deletions(-)
>
> diff --git a/libdw/cfi.h b/libdw/cfi.h
> index d0134243..48e42a41 100644
> --- a/libdw/cfi.h
> +++ b/libdw/cfi.h
> @@ -98,6 +98,10 @@ struct Dwarf_CFI_s
> /* Search tree for parsed DWARF expressions, indexed by raw pointer. */
> search_tree expr_tree;
>
> + /* Should be held when calling __libdw_find_fde and when
> __libdw_intern_expression
> + is called with Dwarf_CFI members. */
> + mutex_define(, lock);
> +
> /* Backend hook. */
> struct ebl *ebl;
>
OK, used in dwarf_cfi_addrframe, dwarf_frame_cfa, dwarf_frame_register.
I needed to double check the use of __libdw_intern_expression in
libdw/dwarf_getlocation.c (getlocation) though. That one is also
properly "locked", but using the cu intern_lock instead. Which makes
sense. But maybe we can document that somewhere?
> diff --git a/libdw/dwarf_begin_elf.c b/libdw/dwarf_begin_elf.c
> index 58a309e9..63e2805d 100644
> --- a/libdw/dwarf_begin_elf.c
> +++ b/libdw/dwarf_begin_elf.c
> @@ -359,6 +359,11 @@ valid_p (Dwarf *result)
> result->fake_loc_cu->version = 4;
> result->fake_loc_cu->split = NULL;
> eu_search_tree_init (&result->fake_loc_cu->locs_tree);
> + rwlock_init (result->fake_loc_cu->abbrev_lock);
> + rwlock_init (result->fake_loc_cu->split_lock);
> + mutex_init (result->fake_loc_cu->src_lock);
> + mutex_init (result->fake_loc_cu->str_off_base_lock);
> + mutex_init (result->fake_loc_cu->intern_lock);
> }
> }
>
> @@ -387,6 +392,11 @@ valid_p (Dwarf *result)
> result->fake_loclists_cu->version = 5;
> result->fake_loclists_cu->split = NULL;
> eu_search_tree_init (&result->fake_loclists_cu->locs_tree);
> + rwlock_init (result->fake_loclists_cu->abbrev_lock);
> + rwlock_init (result->fake_loclists_cu->split_lock);
> + mutex_init (result->fake_loclists_cu->src_lock);
> + mutex_init (result->fake_loclists_cu->str_off_base_lock);
> + mutex_init (result->fake_loclists_cu->intern_lock);
> }
> }
>
> @@ -420,6 +430,11 @@ valid_p (Dwarf *result)
> result->fake_addr_cu->version = 5;
> result->fake_addr_cu->split = NULL;
> eu_search_tree_init (&result->fake_addr_cu->locs_tree);
> + rwlock_init (result->fake_addr_cu->abbrev_lock);
> + rwlock_init (result->fake_addr_cu->split_lock);
> + mutex_init (result->fake_addr_cu->src_lock);
> + mutex_init (result->fake_addr_cu->str_off_base_lock);
> + mutex_init (result->fake_addr_cu->intern_lock);
> }
> }
O, that is good. Missed completely that the "fake CUs" never had their
locks initialized. That must have caused some unexpected issues.
> diff --git a/libdw/dwarf_cfi_addrframe.c b/libdw/dwarf_cfi_addrframe.c
> index 44240279..f391f9f9 100644
> --- a/libdw/dwarf_cfi_addrframe.c
> +++ b/libdw/dwarf_cfi_addrframe.c
> @@ -39,7 +39,10 @@ dwarf_cfi_addrframe (Dwarf_CFI *cache, Dwarf_Addr address,
> Dwarf_Frame **frame)
> if (cache == NULL)
> return -1;
>
> + mutex_lock (cache->lock);
> struct dwarf_fde *fde = __libdw_find_fde (cache, address);
> + mutex_unlock (cache->lock);
> +
> if (fde == NULL)
> return -1;
Ack. So to my surprise this seems to be the only use of libdw_find_fd.
Does this mean we can use the non-locked tfind and tsearch calls now
inside libdw_find_fd (instead of the eu_tfind and eu_tsearch variants)?
> diff --git a/libdw/dwarf_end.c b/libdw/dwarf_end.c
> index 1628e448..979b1168 100644
> --- a/libdw/dwarf_end.c
> +++ b/libdw/dwarf_end.c
> @@ -61,17 +61,19 @@ static void
> cu_free (void *arg)
> {
> struct Dwarf_CU *p = (struct Dwarf_CU *) arg;
> +
> eu_search_tree_fini (&p->locs_tree, noop_free);
> + rwlock_fini (p->abbrev_lock);
> + rwlock_fini (p->split_lock);
> + mutex_fini (p->src_lock);
> + mutex_fini (p->str_off_base_lock);
> + mutex_fini (p->intern_lock);
>
> /* Only free the CU internals if its not a fake CU. */
> if (p != p->dbg->fake_loc_cu && p != p->dbg->fake_loclists_cu
> && p != p->dbg->fake_addr_cu)
> {
> Dwarf_Abbrev_Hash_free (&p->abbrev_hash);
> - rwlock_fini (p->abbrev_lock);
> - rwlock_fini (p->split_lock);
> - mutex_fini (p->src_lock);
> - mutex_fini (p->str_off_base_lock);
>
> /* Free split dwarf one way (from skeleton to split). */
> if (p->unit_type == DW_UT_skeleton
OK, good to see this fake cu locking fixed.
> diff --git a/libdw/dwarf_frame_cfa.c b/libdw/dwarf_frame_cfa.c
> index 07f998cd..c18ee499 100644
> --- a/libdw/dwarf_frame_cfa.c
> +++ b/libdw/dwarf_frame_cfa.c
> @@ -57,11 +57,13 @@ dwarf_frame_cfa (Dwarf_Frame *fs, Dwarf_Op **ops, size_t
> *nops)
>
> case cfa_expr:
> /* Parse the expression into internal form. */
> + mutex_lock (fs->cache->lock);
> result = __libdw_intern_expression
> (NULL, fs->cache->other_byte_order,
> fs->cache->e_ident[EI_CLASS] == ELFCLASS32 ? 4 : 8, 4,
> &fs->cache->expr_tree, &fs->cfa_data.expr, false, false,
> ops, nops, IDX_debug_frame);
> + mutex_unlock (fs->cache->lock);
> break;
>
> case cfa_invalid:
> diff --git a/libdw/dwarf_frame_register.c b/libdw/dwarf_frame_register.c
> index a6b7c4c1..dbd7f1b2 100644
> --- a/libdw/dwarf_frame_register.c
> +++ b/libdw/dwarf_frame_register.c
> @@ -109,12 +109,16 @@ dwarf_frame_register (Dwarf_Frame *fs, int regno,
> Dwarf_Op ops_mem[3],
> block.data = (void *) p;
>
> /* Parse the expression into internal form. */
> - if (__libdw_intern_expression (NULL,
> - fs->cache->other_byte_order,
> - address_size, 4,
> - &fs->cache->expr_tree, &block,
> - true, reg->rule == reg_val_expression,
> - ops, nops, IDX_debug_frame) < 0)
> + mutex_lock (fs->cache->lock);
> + int res = __libdw_intern_expression (NULL,
> + fs->cache->other_byte_order,
> + address_size, 4,
> + &fs->cache->expr_tree, &block,
> + true, reg->rule ==
> reg_val_expression,
> + ops, nops, IDX_debug_frame);
> + mutex_unlock (fs->cache->lock);
> +
> + if (res < 0)
> return -1;
> break;
> }
Similar question for __libdw_intern_expression, since there can now not
be any concurrent calls to __libdw_intern_expression for the same cache
(or cu), can we now switch to the non-locked tfind and tsearch variants
inside __libdw_intern_expression?
> diff --git a/libdw/dwarf_getcfi.c b/libdw/dwarf_getcfi.c
> index a4497152..893e3c74 100644
> --- a/libdw/dwarf_getcfi.c
> +++ b/libdw/dwarf_getcfi.c
> @@ -70,6 +70,7 @@ dwarf_getcfi (Dwarf *dbg)
> eu_search_tree_init (&cfi->cie_tree);
> eu_search_tree_init (&cfi->fde_tree);
> eu_search_tree_init (&cfi->expr_tree);
> + mutex_init (cfi->lock);
>
> cfi->ebl = NULL;
Ack.
> diff --git a/libdw/dwarf_getlocation.c b/libdw/dwarf_getlocation.c
> index ad1d46ca..7bf96716 100644
> --- a/libdw/dwarf_getlocation.c
> +++ b/libdw/dwarf_getlocation.c
> @@ -216,27 +216,38 @@ is_constant_offset (Dwarf_Attribute *attr,
>
> if (found == NULL)
> {
> - Dwarf_Word offset;
> - if (INTUSE(dwarf_formudata) (attr, &offset) != 0)
> - return -1;
> + mutex_lock (attr->cu->intern_lock);
> +
> + found = eu_tfind (&fake, &attr->cu->locs_tree, loc_compare);
> + if (found == NULL)
> + {
> + Dwarf_Word offset;
> + if (INTUSE(dwarf_formudata) (attr, &offset) != 0)
> + {
> + mutex_unlock (attr->cu->intern_lock);
> + return -1;
> + }
>
> - Dwarf_Op *result = libdw_alloc (attr->cu->dbg,
> - Dwarf_Op, sizeof (Dwarf_Op), 1);
> + Dwarf_Op *result = libdw_alloc (attr->cu->dbg,
> + Dwarf_Op, sizeof (Dwarf_Op), 1);
>
> - result->atom = DW_OP_plus_uconst;
> - result->number = offset;
> - result->number2 = 0;
> - result->offset = 0;
> + result->atom = DW_OP_plus_uconst;
> + result->number = offset;
> + result->number2 = 0;
> + result->offset = 0;
>
> - /* Insert a record in the search tree so we can find it again later.
> */
> - struct loc_s *newp = libdw_alloc (attr->cu->dbg,
> - struct loc_s, sizeof (struct loc_s),
> - 1);
> - newp->addr = attr->valp;
> - newp->loc = result;
> - newp->nloc = 1;
> + /* Insert a record in the search tree so we can find it again later.
> */
> + struct loc_s *newp = libdw_alloc (attr->cu->dbg,
> + struct loc_s, sizeof (struct loc_s),
> + 1);
> + newp->addr = attr->valp;
> + newp->loc = result;
> + newp->nloc = 1;
> +
> + found = eu_tsearch (newp, &attr->cu->locs_tree, loc_compare);
> + }
>
> - found = eu_tsearch (newp, &attr->cu->locs_tree, loc_compare);
> + mutex_unlock (attr->cu->intern_lock);
> }
>
> assert ((*found)->nloc == 1);
OK, so intern_lock now covers both tfind and tsearch (again, do we need
the eu_ variants here? isn't that double locking now?)
> @@ -674,13 +685,17 @@ getlocation (struct Dwarf_CU *cu, const Dwarf_Block
> *block,
> return 0;
> }
>
> - return __libdw_intern_expression (cu->dbg, cu->dbg->other_byte_order,
> - cu->address_size, (cu->version == 2
> - ? cu->address_size
> - : cu->offset_size),
> - &cu->locs_tree, block,
> - false, false,
> - llbuf, listlen, sec_index);
> + mutex_lock (cu->intern_lock);
> + int res = __libdw_intern_expression (cu->dbg, cu->dbg->other_byte_order,
> + cu->address_size, (cu->version == 2
> + ? cu->address_size
> + : cu->offset_size),
> + &cu->locs_tree, block,
> + false, false,
> + llbuf, listlen, sec_index);
> + mutex_unlock (cu->intern_lock);
> +
> + return res;
> }
Ack, so cu intern_lock here instead of Dwarf_CFI one.
> int
> diff --git a/libdw/dwarf_getmacros.c b/libdw/dwarf_getmacros.c
> index f1c831fa..80f652c3 100644
> --- a/libdw/dwarf_getmacros.c
> +++ b/libdw/dwarf_getmacros.c
> @@ -322,15 +322,29 @@ cache_op_table (Dwarf *dbg, int sec_index, Dwarf_Off
> macoff,
> if (found != NULL)
> return *found;
>
> + mutex_lock (dbg->macro_lock);
> +
> + found = eu_tfind (&fake, &dbg->macro_ops_tree, macro_op_compare);
> + if (found != NULL)
> + {
> + mutex_unlock (dbg->macro_lock);
> + return *found;
> + }
> +
> Dwarf_Macro_Op_Table *table = sec_index == IDX_debug_macro
> ? get_table_for_offset (dbg, macoff, startp, endp, cudie)
> : get_macinfo_table (dbg, macoff, cudie);
>
> if (table == NULL)
> - return NULL;
> + {
> + mutex_unlock (dbg->macro_lock);
> + return NULL;
> + }
>
> Dwarf_Macro_Op_Table **ret = eu_tsearch (table, &dbg->macro_ops_tree,
> macro_op_compare);
> + mutex_unlock (dbg->macro_lock);
> +
> if (unlikely (ret == NULL))
> {
> __libdw_seterrno (DWARF_E_NOMEM);
OK, so both tfind and tsearch now covered by the dbg->macro_lock
guarding the macro_ops_tree (eu_ variants still needed?)
> diff --git a/libdw/frame-cache.c b/libdw/frame-cache.c
> index 6c89858a..6f6f777e 100644
> --- a/libdw/frame-cache.c
> +++ b/libdw/frame-cache.c
> @@ -64,6 +64,7 @@ __libdw_destroy_frame_cache (Dwarf_CFI *cache)
> eu_search_tree_fini (&cache->fde_tree, free_fde);
> eu_search_tree_fini (&cache->cie_tree, free_cie);
> eu_search_tree_fini (&cache->expr_tree, free_expr);
> + mutex_fini (cache->lock);
>
> if (cache->ebl != NULL && cache->ebl != (void *) -1l)
> ebl_closebackend (cache->ebl);
Ack.
> diff --git a/libdw/libdwP.h b/libdw/libdwP.h
> index de80cd4e..77d0c113 100644
> --- a/libdw/libdwP.h
> +++ b/libdw/libdwP.h
> @@ -269,7 +269,7 @@ struct Dwarf
> __libdw_intern_next_unit. */
> mutex_define(, dwarf_lock);
>
> - /* Synchronize access to dwarf_macro_getsrcfiles. */
> + /* Synchronize access to dwarf_macro_getsrcfiles and cache_op_table. */
> mutex_define(, macro_lock);
Thanks for updating the comment.
> /* Internal memory handling. This is basically a simplified thread-local
> @@ -471,6 +471,10 @@ struct Dwarf_CU
> Covers __libdw_str_offsets_base_off. */
> mutex_define(, str_off_base_lock);
>
> + /* Synchronize access to is_constant_offset. Should also be held
> + when calling __libdw_intern_expression with Dwarf_CU members. */
> + mutex_define(, intern_lock);
> +
Right. So maybe cross reference doc between here and Dwarf_CFI_s lock?
> /* Memory boundaries of this CU. */
> void *startp;
> void *endp;
> diff --git a/libdw/libdw_findcu.c b/libdw/libdw_findcu.c
> index 0e4dcc37..59267343 100644
> --- a/libdw/libdw_findcu.c
> +++ b/libdw/libdw_findcu.c
> @@ -181,6 +181,7 @@ __libdw_intern_next_unit (Dwarf *dbg, bool debug_types)
> rwlock_init (newp->split_lock);
> mutex_init (newp->src_lock);
> mutex_init (newp->str_off_base_lock);
> + mutex_init (newp->intern_lock);
>
> /* v4 debug type units have version == 4 and unit_type == DW_UT_type. */
> if (debug_types)
Ack.
> @@ -240,8 +241,6 @@ struct Dwarf_CU *
> internal_function
> __libdw_findcu (Dwarf *dbg, Dwarf_Off start, bool v4_debug_types)
> {
> - mutex_lock (dbg->dwarf_lock);
> -
> search_tree *tree = v4_debug_types ? &dbg->tu_tree : &dbg->cu_tree;
> Dwarf_Off *next_offset
> = v4_debug_types ? &dbg->next_tu_offset : &dbg->next_cu_offset;
> @@ -250,6 +249,12 @@ __libdw_findcu (Dwarf *dbg, Dwarf_Off start, bool
> v4_debug_types)
> struct Dwarf_CU fake = { .start = start, .end = 0 };
> struct Dwarf_CU **found = eu_tfind (&fake, tree, findcu_cb);
> struct Dwarf_CU *result = NULL;
> + if (found != NULL)
> + return *found;
> +
> + mutex_lock (dbg->dwarf_lock);
> +
> + found = eu_tfind (&fake, tree, findcu_cb);
> if (found != NULL)
> {
> mutex_unlock (dbg->dwarf_lock);
OK, so in this case you do need the eu_tfind variant because both calls
need to be guarded. The second call just has one more extra lock around
it (the dwarf_lock) that also covers the possible eu_tsearch call.
Thanks,
Mark