On Mon, Aug 04, 2025 at 11:20:54PM -0400, Aaron Merey wrote:
> Implement concurrent execution of print_debug_* functions during handling
> of -w, --debug-dump using libthread.a.
> 
> A new `-C, --concurrency=NUM` command line option controls the maximum
> number of threads that may be used. This value defaults to the number of CPUs.
> 
> Job output is buffered and printed in the order that jobs were added to
> the queue. This helps preserve the existing order of stdout output.
> 
>       * src/readelf.c (default_concurrency): Function estimating the
>       maximum number of threads.
>       (parse_opt): Handle -C, --concurrency=NUM.
>       (do_job): Entry point function for threads.
>       (schedule_job): If thread safety is enabled, add job to the
>       job queue.  Otherwise just run the job from the main thread.
>       (print_debug): Pass print_debug_* function pointers and
>       args to schedule_job. Also call run_jobs if thread safety
>       is enabled.
> 
> Signed-off-by: Aaron Merey <ame...@redhat.com>
> ---
> v5: use 1 as a minimum default number of threads instead of 2.

OK, so if sched_getaffinity fails we are just single threaded.

> Restore static qualifier for sort_listptr_name

Ack.

> On Wed, Jul 2, 2025 at 5:21 PM Mark Wielaard <m...@klomp.org> wrote:
> > > > > +  unsigned fn = 0;
> > > > > +#ifdef HAVE_GETRLIMIT
> > > > > +  {
> > > > > +    struct rlimit rlim;
> > > > > +    int rc = getrlimit (RLIMIT_NOFILE, &rlim);
> > > > > +    if (rc == 0)
> > > > > +      fn = MAX ((rlim_t) 1, (rlim.rlim_cur - 100) / 2);
> > > > > +    /* Conservatively estimate that at least 2 fds are used
> > > > > +       by each thread.  */
> > > > > +  }
> > > > > +#endif
> > > >
> > > > Interesting. I wouldn't have thought about that. But makes sense. If
> > > > you check available file descriptors, would it also make sense to
> > > > check for available memory? And limit the number of jobs to mem/1024M?
> > >
> > > This function is copied from debuginfod.cxx where we haven't run into
> > > issues with >1024 cores or the number of jobs exceeding mem/1024M.
> >
> > But is that because it was never run on systems with >1024 core

I don't want to hold up the patch for this, but I still not really
understand why sched_getaffinity is used and what really happens for
really large core counts. Wouldn't it be simpler to use get_nprocs ()
or sysconf (_SC_NPROCESSORS_ONLN)?

> > or  with low memory?
> 
> This is tricky because per-thread memory usage is very hard to predict
> since memory usage is not directly correlated with thread count.  The
> memory footprint across 64 threads could far less than 1GB, while 2
> threads in extreme cases might buffer many GB of output.
> 
> I also have concerns about whether the memory that's available at the
> time of check ends up being a reliable predictor of real per-thread
> capacity, given OS-level memory overcommit and caching.
> 
> The user does have the option to run eu-readelf in single thread
> mode (--concurrency=1) which does not buffer output and reverts to
> current memory usage patterns.

OK, but it doesn't have to be 1GB, it could be 256M. It would be nice
to have some kind of memory limit, but I guess it only really matters
for huge core counts.

> > If it is common code, maybe it should be part of threadlib that
> > debuginfod then also uses?
> 
> There are some difference between debuginfod default_concurrency
> and eu-readelf's. Debuginfod checks std::thread::hardware_concurrency
> which isn't available in eu-readelf, plus it uses a different estimate
> for number of file descriptors per thread.  These differences could be
> parameterized but the common code is quite straightforward.

In practice std::thread::hardware_concurrency is get_nprocs.  I still
think it would to have some common (parameterized, on fds, mem, etc)
code here, because I don't think the code is that straightforward. But
that could be a followup patch.

Thanks,

Mark

Reply via email to