Re: [PATCH] src: fix DEREF_OF_NULL.RET.STAT in readelf.c in
Hi Anton, On Thu, Feb 27, 2025 at 10:02:19PM +0100, Mark Wielaard wrote: > On Thu, Feb 13, 2025 at 07:52:00PM +0300, Anton Moryakov wrote: > > Static analyzer reported: > > Return value of a function 'gelf_getehdr' is dereferenced at readelf.c:12443 > > without checking for NULL, but it is usually checked for this function > > (53/54). > > I can see how a static analyzer thinks this gelf_getehdr call can > fail. But it really cannot in this case. The core Elf was already > checked for ehdr->type == ET_CORE in one of the callers. Or we > wouldn't have gotten here. > > What you could do to help the analyzer, and make this code slightly > cleaner, is pass down that ehdr from handle_notes through the various > handle_* functions. I implemented this and puhsed it as attached. Cheers, Mark>From 56e7a862d82e646bae79706343afa5f7800e2be1 Mon Sep 17 00:00:00 2001 From: Mark Wielaard Date: Tue, 29 Apr 2025 23:19:47 +0200 Subject: [PATCH 2/3] readelf: Pass around GElf_Ehdr instead of calling gelf_getehdr handle_core_item calls gelf_getehdr for each item without checking if the call succeeds. It should always succeed because process_elf_file already checked gelf_getehdr returned a valid Ehdr before passing it to handle_notes. Just pass the Ehdr down a couple more function calls. * readelf.c (handle_core_item): Take const Gelf_Ehdr and use it. (handle_core_items): Take const Gelf_Ehdr and pass it to handle_core_item. (handle_core_note): Take const Gelf_Ehdr and pass it to handle_core_items. (handle_notes_data): Pass ehdr to handle_core_note. Signed-off-by: Mark Wielaard --- src/readelf.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/src/readelf.c b/src/readelf.c index 5b0e7b474064..8603b3c441a1 100644 --- a/src/readelf.c +++ b/src/readelf.c @@ -12322,7 +12322,8 @@ convert (Elf *core, Elf_Type type, uint_fast16_t count, typedef uint8_t GElf_Byte; static unsigned int -handle_core_item (Elf *core, const Ebl_Core_Item *item, const void *desc, +handle_core_item (Elf *core, const GElf_Ehdr *ehdr, + const Ebl_Core_Item *item, const void *desc, unsigned int colno, size_t *repeated_size) { uint_fast16_t count = item->count ?: 1; @@ -12501,8 +12502,6 @@ handle_core_item (Elf *core, const Ebl_Core_Item *item, const void *desc, high half is the padding; it's presumably zero, but should be ignored anyway. For big-endian, it means the 32-bit field went into the high half of USEC. */ - GElf_Ehdr ehdr_mem; - GElf_Ehdr *ehdr = gelf_getehdr (core, &ehdr_mem); if (likely (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)) usec >>= 32; else @@ -12594,7 +12593,8 @@ compare_core_item_groups (const void *a, const void *b) } static unsigned int -handle_core_items (Elf *core, const void *desc, size_t descsz, +handle_core_items (Elf *core, const GElf_Ehdr *ehdr, + const void *desc, size_t descsz, const Ebl_Core_Item *items, size_t nitems) { if (nitems == 0) @@ -12610,7 +12610,7 @@ handle_core_items (Elf *core, const void *desc, size_t descsz, { assert (items[0].offset == 0); size_t size = descsz; - colno = handle_core_item (core, items, desc, colno, &size); + colno = handle_core_item (core, ehdr, items, desc, colno, &size); /* If SIZE is not zero here there is some remaining data. But we do not know how to process it anyway. */ return colno; @@ -12645,7 +12645,7 @@ handle_core_items (Elf *core, const void *desc, size_t descsz, && ((*item)->group == groups[i][0]->group || !strcmp ((*item)->group, groups[i][0]->group))); ++item) - colno = handle_core_item (core, *item, desc, colno, NULL); + colno = handle_core_item (core, ehdr, *item, desc, colno, NULL); /* Force a line break at the end of the group. */ colno = WRAP_COLUMN; @@ -13146,7 +13146,7 @@ handle_file_note (Elf *core, GElf_Word descsz, GElf_Off desc_pos) } static void -handle_core_note (Ebl *ebl, const GElf_Nhdr *nhdr, +handle_core_note (Ebl *ebl, const GElf_Ehdr *ehdr, const GElf_Nhdr *nhdr, const char *name, const void *desc) { GElf_Word regs_offset; @@ -13163,7 +13163,7 @@ handle_core_note (Ebl *ebl, const GElf_Nhdr *nhdr, so that the ITEMS array does not describe the whole thing. For non-register notes, the actual descsz might be a multiple of the unit size, not just exactly the unit size. */ - unsigned int colno = handle_core_items (ebl->elf, desc, + unsigned int colno = handle_core_items (ebl->elf, ehdr, desc, nregloc == 0 ? nhdr->n_descsz : 0, items, nitems); if (colno != 0) @@ -13243,10 +13243,10 @@ handle_notes_data (Ebl *ebl, const
Re: [PATCH] src: fix DEREF_OF_NULL.RET.STAT in unstrip.c
Hi Anton, On Thu, Feb 27, 2025 at 10:28:20PM +0100, Mark Wielaard wrote: > > diff --git a/src/unstrip.c b/src/unstrip.c > > index d70053de..35c04700 100644 > > --- a/src/unstrip.c > > +++ b/src/unstrip.c > > @@ -1974,6 +1974,9 @@ more sections in stripped file than debug file -- > > arguments reversed?")); > > } > > } > > > > + if (symstrdata == NULL) > > + error_exit (0, "Failed to get data from symbol string table"); > > + > >if (dwelf_strtab_finalize (symstrtab, symstrdata) == NULL) > > error_exit (0, "Not enough memory to create symbol table"); > > If you check this why not at the point where elf_getdata is called > (symstrdata is assigned?). And then you should also check the other > elf_getdata call at the same time here: > > symdata = elf_getdata (unstripped_symtab, NULL); > symstrdata = elf_getdata (unstripped_strtab, NULL); I implemented that and pushed as attached. Cheers, Mark>From 50586ba4a99c06674962fe3bdd685088ab6808e1 Mon Sep 17 00:00:00 2001 From: Mark Wielaard Date: Wed, 30 Apr 2025 00:22:59 +0200 Subject: [PATCH 3/3] unstrip: Check symtab and strtab sections have data before use. * src/unstrip.c (copy_elided_sections): Check elf_getdata result for symtab and strtab sections. Suggested-by: Anton Moryakov Signed-off-by: Mark Wielaard --- src/unstrip.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/unstrip.c b/src/unstrip.c index d70053def292..0ae76f77e9ae 100644 --- a/src/unstrip.c +++ b/src/unstrip.c @@ -1946,7 +1946,13 @@ more sections in stripped file than debug file -- arguments reversed?")); /* Now we are ready to write the new symbol table. */ symdata = elf_getdata (unstripped_symtab, NULL); + if (symdata == NULL) +error_exit (0, "Failed to get data from symbol table: %s", + elf_errmsg (-1)); symstrdata = elf_getdata (unstripped_strtab, NULL); + if (symstrdata == NULL) +error_exit (0, "Failed to get data from symbol string table: %s", + elf_errmsg (-1)); Elf_Data *shndxdata = NULL; /* XXX */ /* If symtab and the section header table share the string table -- 2.49.0
Re: [PATCH] scr: fix DEREF_OF_NULL.RET.STAT in ar.c
Hi Anton, On Thu, Feb 27, 2025 at 06:12:36PM +0100, Mark Wielaard wrote: > The subject isn't super helpful unless you know the specific > terminology of the statuc analyzer you are using. It would be better to > say something like: > > ar: check whether elf_getarhdr returns NULL > > > --- a/src/ar.c > > +++ b/src/ar.c > > @@ -498,6 +498,9 @@ do_oper_extract (int oper, const char *arfname, char > > **argv, int argc, > >while ((subelf = elf_begin (fd, cmd, elf)) != NULL) > > { > >Elf_Arhdr *arhdr = elf_getarhdr (subelf); > > + > > + if (arhdr == NULL) > > + goto next; > > > > OK, but the identation is completely wrong. There is extra whitespace > on the first line, just remove the line. The if line should be indented > 6 spaces, the goto line should be indented one tab. > > >if (strcmp (arhdr->ar_name, "/") == 0) > > { > > @@ -860,6 +863,9 @@ write_member (struct armem *memb, off_t *startp, off_t > > *lenp, Elf *elf, > > { > >/* In case of a long file name we assume the archive header > > changed and we write it here. */ > > + > > + if (arhdr == NULL) > > + goto next; > >memcpy (&arhdr, elf_rawfile (elf, NULL) + *startp, sizeof (arhdr)); > > This doesn't make sense to me, does it even compile? > arhdr is a struct ar_hdr not a pointer to it. > > >snprintf (tmpbuf, sizeof (tmpbuf), "/%-*ld", > > @@ -943,6 +949,9 @@ do_oper_delete (const char *arfname, char **argv, int > > argc, > >while ((subelf = elf_begin (fd, cmd, elf)) != NULL) > > { > >Elf_Arhdr *arhdr = elf_getarhdr (subelf); > > + > > + if (arhdr == NULL) > > + goto next; > > This does make sense, but indentation needs to be fixed like above. > > >/* Ignore the symbol table and the long file name table here. */ > >if (strcmp (arhdr->ar_name, "/") == 0 > > @@ -1152,6 +1161,9 @@ do_oper_insert (int oper, const char *arfname, char > > **argv, int argc, > >while ((subelf = elf_begin (fd, cmd, elf)) != NULL) > > { > >Elf_Arhdr *arhdr = elf_getarhdr (subelf); > > + > > + if (arhdr == NULL) > > + goto next; > > Likewise. I reimplemented this and committed the attached. Cheers, Mark>From 04839d5826d21e7a603a76fddc7afed6d32ab087 Mon Sep 17 00:00:00 2001 From: Mark Wielaard Date: Tue, 29 Apr 2025 22:16:58 +0200 Subject: [PATCH 1/3] ar: Check elf_getahdr doesn't return NULL When elf_getahdr returns NULL we shouldn't even try to handle the ar header, but immediately go to the next entry. * src/ar.c (do_oper_extract): If elf_getahdr goto next. (do_oper_delete): Likewise. (do_oper_insert): Likewise. Suggested-by: Anton Moryakov Signed-off-by: Mark Wielaard --- src/ar.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/ar.c b/src/ar.c index 9ace28b918d3..03118c4eadbe 100644 --- a/src/ar.c +++ b/src/ar.c @@ -498,6 +498,8 @@ do_oper_extract (int oper, const char *arfname, char **argv, int argc, while ((subelf = elf_begin (fd, cmd, elf)) != NULL) { Elf_Arhdr *arhdr = elf_getarhdr (subelf); + if (arhdr == NULL) + goto next; if (strcmp (arhdr->ar_name, "/") == 0) { @@ -943,6 +945,8 @@ do_oper_delete (const char *arfname, char **argv, int argc, while ((subelf = elf_begin (fd, cmd, elf)) != NULL) { Elf_Arhdr *arhdr = elf_getarhdr (subelf); + if (arhdr == NULL) + goto next; /* Ignore the symbol table and the long file name table here. */ if (strcmp (arhdr->ar_name, "/") == 0 @@ -1152,6 +1156,8 @@ do_oper_insert (int oper, const char *arfname, char **argv, int argc, while ((subelf = elf_begin (fd, cmd, elf)) != NULL) { Elf_Arhdr *arhdr = elf_getarhdr (subelf); + if (arhdr == NULL) + goto next; /* Ignore the symbol table and the long file name table here. */ if (strcmp (arhdr->ar_name, "/") == 0 -- 2.49.0
☝ Buildbot (Sourceware): elfutils - worker not pinged (main)
A retry build has been detected on builder elfutils-debian-ppc64 while building elfutils. Full details are available at: https://builder.sourceware.org/buildbot/#/builders/63/builds/496 Build state: worker not pinged Revision: (unknown) Worker: debian-ppc64 Build Reason: (unknown) Blamelist: Mark Wielaard Steps: - 0: worker_preparation ( exception ) Logs: - err.text: https://builder.sourceware.org/buildbot/#/builders/63/builds/496/steps/0/logs/err_text - err.html: https://builder.sourceware.org/buildbot/#/builders/63/builds/496/steps/0/logs/err_html
Re: [RFC v2] sketch of an unwinder cache interface for elfutils libdwfl
On Thu, Mar 20, 2025, at 10:28 AM, Serhei Makarov wrote: > On Tue, Dec 10, 2024, at 4:42 PM, Serhei Makarov wrote: >> This email sketches an 'unwinder cache' interface for libdwfl, derived >> from recent eu-stacktrace code [1] and based on Christian Hergert's >> adaptation of eu-stacktrace as sysprof-live-unwinder [2]. The intent is >> to remove the need for a lot of boilerplate code that would be >> identical among profiling tools that use libdwfl to unwind perf_events >> stack samples. But since this becomes a library design, it's in need of >> feedback and bikeshedding. > In advance of finishing up the Dwfl_Process_Tracker patchset > (initial version currently under review), > wanted to post an update summarizing the current api design. Redid performance analysis based on the released code. I'm seeing a reduction of sysprof-live-unwinder overhead from 7~8% to 3~6% (with the framepointer version of sysprof providing a baseline of about 1.5%). So there is some variance (the lower the overhead gets, the harder it is to keep conditions identical, apparently), but the performance is moving along; the next bottleneck to look at is dwfl_linux_proc_report. I may want to automate the performance testing procedure to get fully exact numbers I'm comfortable with reporting. There is an issue with spontaneous exit of sysprofd (cleanly, with a "Stopping RAPL monitor" message) that I'm trying to understand. Hypotheses -- may be a polkit issue, or may be sysprof-live-unwinder not handling some error result gracefully. (The eu-stacktrace tool + prototype sysprof patches don't exhibit this behaviour.) I expect I'll need to make another revision of the sysprof-live-unwinder patches at https://git.sr.ht/~serhei/sysprof-experiments/log/serhei/live-unwinder With the Elf caching in Dwfl_Process_Tracker, I counted (via attached simple patch) how many times an Elf struct was retrieved from cache vs how many were newly created. On a quick test with a swaywm system the 'created' number stabilized at ~186 created Elf structs with the 'cached' number ~400 and rising as I keep running the profiler. On gnome3, stabilizes at ~282 structs with the 'cached' number ~500 and rising. Obviously, this is not super meaningful, as the number can be made arbitrarily good by running the profiler for longer periods of time :p but it's worth verifying that the caching works. All the best, Serheidiff --git a/libdwfl/dwfl_process_tracker_find_elf.c b/libdwfl/dwfl_process_tracker_find_elf.c index 72621bb1..d966346f 100644 --- a/libdwfl/dwfl_process_tracker_find_elf.c +++ b/libdwfl/dwfl_process_tracker_find_elf.c @@ -38,6 +38,9 @@ #include "libdwflP.h" +static int created_elf = 0; +static int cached_elf = 0; + /* TODO: Consider making this a public api, dwfl_process_tracker_find_cached_elf. */ bool find_cached_elf (Dwfl_Process_Tracker *tracker, @@ -68,6 +71,8 @@ find_cached_elf (Dwfl_Process_Tracker *tracker, *elfp = ent->elf; *file_name = strdup(ent->module_name); *fdp = ent->fd; + cached_elf ++; + fprintf(stderr, "= dwfl_process_tracker_find_elf retrieves CACHED (%d created / %d cached) name=%s fd=%d elfp=%p ref_count=%d\n", created_elf, cached_elf, ent->module_name, ent->fd, ent->elf, ent->elf->ref_count); /* DEBUG */ return true; } @@ -118,6 +123,8 @@ cache_elf (Dwfl_Process_Tracker *tracker, ent->last_mtime = sb.st_mtime; } rwlock_unlock(tracker->elftab_lock); + created_elf ++; + fprintf(stderr, "+ dwfl_process_tracker_find_elf CREATES (%d created / %d cached) new name=%s file_name=%s fd=%d elfp=%p ref_count=%d\n", created_elf, cached_elf, ent->module_name, file_name, fd, elf, elf ? elf->ref_count : 0); /* DEBUG */ return true; }
☠ Buildbot (Sourceware): elfutils - failed test (failure) (main)
A new failure has been detected on builder elfutils-debian-armhf while building elfutils. Full details are available at: https://builder.sourceware.org/buildbot/#/builders/6/builds/407 Build state: failed test (failure) Revision: 8aea98a5355d4c42e339ed508813920db0174940 Worker: debian-armhf Build Reason: (unknown) Blamelist: Serhei Makarov Steps: - 0: worker_preparation ( success ) - 1: set package name ( success ) - 2: git checkout ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/2/logs/stdio - 3: autoreconf ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/3/logs/stdio - 4: configure ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/4/logs/stdio - config.log: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/4/logs/config_log - 5: get version ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/5/logs/stdio - property changes: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/5/logs/property_changes - 6: make ( warnings ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/6/logs/stdio - warnings (3): https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/6/logs/warnings__3_ - 7: make check ( failure ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/7/logs/stdio - test-suite.log: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/7/logs/test-suite_log - 8: prep ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/8/logs/stdio - 9: build bunsen.cpio.gz ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/9/logs/stdio - 10: fetch bunsen.cpio.gz ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/10/logs/stdio - 11: unpack bunsen.cpio.gz ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/11/logs/stdio - 12: pass .bunsen.source.* ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/12/logs/stdio - 13: upload to bunsen ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/13/logs/stdio - 14: clean up ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/14/logs/stdio - 15: make distclean ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/407/steps/15/logs/stdio
☺ Buildbot (Sourceware): elfutils - build successful (main)
A restored build has been detected on builder elfutils-debian-armhf while building elfutils. Full details are available at: https://builder.sourceware.org/buildbot/#/builders/6/builds/408 Build state: build successful Revision: 921d0e0b58f78af9e0b6d46b974e68420092cad0 Worker: debian-armhf Build Reason: (unknown) Blamelist: Aaron Merey , Serhei Makarov Steps: - 0: worker_preparation ( success ) - 1: set package name ( success ) - 2: git checkout ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/2/logs/stdio - 3: autoreconf ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/3/logs/stdio - 4: configure ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/4/logs/stdio - config.log: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/4/logs/config_log - 5: get version ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/5/logs/stdio - property changes: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/5/logs/property_changes - 6: make ( warnings ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/6/logs/stdio - warnings (3): https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/6/logs/warnings__3_ - 7: make check ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/7/logs/stdio - test-suite.log: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/7/logs/test-suite_log - 8: prep ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/8/logs/stdio - 9: build bunsen.cpio.gz ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/9/logs/stdio - 10: fetch bunsen.cpio.gz ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/10/logs/stdio - 11: unpack bunsen.cpio.gz ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/11/logs/stdio - 12: pass .bunsen.source.* ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/12/logs/stdio - 13: upload to bunsen ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/13/logs/stdio - 14: clean up ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/14/logs/stdio - 15: make distclean ( success ) Logs: - stdio: https://builder.sourceware.org/buildbot/#/builders/6/builds/408/steps/15/logs/stdio
Re: [RFC v2] sketch of an unwinder cache interface for elfutils libdwfl
On 4/29/25 12:15, Serhei Makarov wrote: There is an issue with spontaneous exit of sysprofd (cleanly, with a "Stopping RAPL monitor" message) that I'm trying to understand. Sysprofd only stays alive as long as it's being used (plus some 30 second timeout or so). So that generally means that something that was using an API crashed/exited and therefore it could exit. An active RAPL observer _should_ keep sysprofd alive until the client disconnects/stops it. But sysprof-live-unwinder doesn't necessarily keep sysprofd alive. Sysprofd only needs to spawn it and then hand off the FD to the client. -- Christian
Re: [RFC v2] sketch of an unwinder cache interface for elfutils libdwfl
On 4/29/25 12:15, Serhei Makarov wrote: There is an issue with spontaneous exit of sysprofd (cleanly, with a "Stopping RAPL monitor" message) that I'm trying to understand. Oh, and one quick other thing. I had issues with SELinux killing things while developing. `setenforce 0` "fixed" it. But I think changing our D-Bus API is ultimately what made SELinux stop killing things. Something to do with sockets getting closed maybe? Little hazy :)