readelf.c: Support concurrency for -w, --debug-dump

Mark Wielaard Wed, 02 Jul 2025 14:31:39 -0700

Hi Aaron,

On Sun, Jun 29, 2025 at 11:51:58PM -0400, Aaron Merey wrote:
> Implement concurrent execution of print_debug_* functions during handling
> of -w, --debug-dump using libthread.a.
> 
> A new `-C, --concurrency=NUM` command line option controls the maximum
> number of threads that may be used. This value defaults to the number of CPUs.
> 
> Job output is buffered and printed in the order that jobs were added to
> the queue. This helps preserve the existing order of stdout output.
> 
>       * src/readelf.c (default_concurrency): Function estimating the
>       maximum number of threads.
>       (parse_opt): Handle -C, --concurrency=NUM.
>       (do_job): Entry point function for threads.
>       (schedule_job): If thread safety is enabled, add job to the
>       job queue.  Otherwise just run the job from the main thread.
>       (print_debug): Pass print_debug_* function pointers and
>       args to schedule_job. Also call run_jobs if thread safety
>       is enabled.
> 
> Signed-off-by: Aaron Merey <[email protected]>
> 
> ---
> v4: allow num_thread jobs instead of num_thread - 1
> 
> heap alloc job_data
> 
> Use max_threads instead of max_threads - 1 for determining the maximum
> number of jobs running concurrently.
> 
> Improve error output when the argument to -C is invalid.
> 
> Allocate the job_data on the heap instead of the stack to avoid stack overflow
> if there are a very large number of debug sections.
> 
> > On Tue, May 27, 2025 at 10:04:20AM -0400, Aaron Merey wrote:
> > > Implement concurrent execution of print_debug_* functions during handling
> > > of -w, --debug-dump using libthread.a.
> >
> > Do you intend to also enable this for other sections?
> 
> Yes I will extend parallelization to non-debug sections in future
> patches.


Nice.

> > > +/* Estimate the maximum number of threads. This is normally
> > > +   #CPU.  Return value is guaranteed to be at least 1.  */
> > > +static int
> > > +default_concurrency (void)
> > > +{
> > > +  unsigned aff = 0;
> > > +#ifdef HAVE_SCHED_GETAFFINITY
> > > +  {
> > > +    int ret;
> > > +    cpu_set_t mask;
> > > +    CPU_ZERO (&mask);
> > > +    ret = sched_getaffinity (0, sizeof(mask), &mask);
> > > +    if (ret == 0)
> > > +      aff = CPU_COUNT (&mask);
> > > +  }
> > > +#endif
> >
> > Is this better/worse than using get_nprocs or sysconf
> > (_SC_NPROCESSORS_ONLN)?
> > https://www.gnu.org/software/libc/manual/html_node/Processor-Resources.html
> >
> > Also from the sched_getaffinity man page it looks like cpu_set_t might
> > be too small and the call might fail on systems with > 1024
> > cores. Maybe this is a non-issue though?
> >
> > > +  unsigned fn = 0;
> > > +#ifdef HAVE_GETRLIMIT
> > > +  {
> > > +    struct rlimit rlim;
> > > +    int rc = getrlimit (RLIMIT_NOFILE, &rlim);
> > > +    if (rc == 0)
> > > +      fn = MAX ((rlim_t) 1, (rlim.rlim_cur - 100) / 2);
> > > +    /* Conservatively estimate that at least 2 fds are used
> > > +       by each thread.  */
> > > +  }
> > > +#endif
> >
> > Interesting. I wouldn't have thought about that. But makes sense. If
> > you check available file descriptors, would it also make sense to
> > check for available memory? And limit the number of jobs to mem/1024M?
> 
> This function is copied from debuginfod.cxx where we haven't run into
> issues with >1024 cores or the number of jobs exceeding mem/1024M.

But is that because it was never run on systems with >1024 core or
with low memory?

If it is common code, maybe it should be part of threadlib that
debuginfod then also uses?

> > >  /* To store the name used in compare_listptr */
> > > -static const char *sort_listptr_name;
> > > +_Thread_local const char *sort_listptr_name;
> >
> > O, that is interesting and not in your ChangeLog.
> > Is this the simplest way to make this thread-safe?
> > Could there be a lock around sort_listptr ?
> > What about the static listptr_table known_xxxx[ptr|bases]?
> > Should building up/accessing those be made thread-safe?
> 
> In the current implementation print_debug_info_section, which
> populates the listptr_tables, runs on its own before add_jobs
> and run_jobs are called.  After that each listptr_table is used
> only within one print_debug_* function, so there isn't a risk
> of race conditions at the moment.
> 
> When we extend eu-readelf paralellization in future patches
> print_debug_info_section will run in paralell with other
> print_debug_* functions and the listptr_tables will be protected
> by locks.

Aha. I see I missed the following in the source:

  /* If the .debug_info section is listed as implicitly required then
     we must make sure to handle it before handling any other debug
     section.  Various other sections depend on the CU DIEs being
     scanned (silently) first.  */

And then print_debug_unit has:

  const bool silent = !(print_debug_sections & section_info) && !debug_types;

And even when silent it then does various notice_listptr calls.

Confusing :)

I wonder if it still makes sense to have this extra notice_listptr
pass inside print_debug_unit and whether we shouldn't just have a
dedicated pass for it that then can run before the printing of any
other debug section (if needed).

But yes, I see now. That is for later.

Getting back to the actual change:

>  /* To store the name used in compare_listptr */
> -static const char *sort_listptr_name;
> +_Thread_local const char *sort_listptr_name;

Should it still be static?
Should we maybe include <threads.h> and use thread_local instead?

I read over the whole patch and don't have any other comments. Looks
good.

Thanks,

Mark

Re: [PATCH 3/3 v4] src/readelf.c: Support concurrency for -w, --debug-dump

Reply via email to