On Mon, Aug 04, 2025 at 11:20:54PM -0400, Aaron Merey wrote: > Implement concurrent execution of print_debug_* functions during handling > of -w, --debug-dump using libthread.a. > > A new `-C, --concurrency=NUM` command line option controls the maximum > number of threads that may be used. This value defaults to the number of CPUs. > > Job output is buffered and printed in the order that jobs were added to > the queue. This helps preserve the existing order of stdout output. > > * src/readelf.c (default_concurrency): Function estimating the > maximum number of threads. > (parse_opt): Handle -C, --concurrency=NUM. > (do_job): Entry point function for threads. > (schedule_job): If thread safety is enabled, add job to the > job queue. Otherwise just run the job from the main thread. > (print_debug): Pass print_debug_* function pointers and > args to schedule_job. Also call run_jobs if thread safety > is enabled. > > Signed-off-by: Aaron Merey <ame...@redhat.com> > --- > v5: use 1 as a minimum default number of threads instead of 2.
OK, so if sched_getaffinity fails we are just single threaded. > Restore static qualifier for sort_listptr_name Ack. > On Wed, Jul 2, 2025 at 5:21 PM Mark Wielaard <m...@klomp.org> wrote: > > > > > + unsigned fn = 0; > > > > > +#ifdef HAVE_GETRLIMIT > > > > > + { > > > > > + struct rlimit rlim; > > > > > + int rc = getrlimit (RLIMIT_NOFILE, &rlim); > > > > > + if (rc == 0) > > > > > + fn = MAX ((rlim_t) 1, (rlim.rlim_cur - 100) / 2); > > > > > + /* Conservatively estimate that at least 2 fds are used > > > > > + by each thread. */ > > > > > + } > > > > > +#endif > > > > > > > > Interesting. I wouldn't have thought about that. But makes sense. If > > > > you check available file descriptors, would it also make sense to > > > > check for available memory? And limit the number of jobs to mem/1024M? > > > > > > This function is copied from debuginfod.cxx where we haven't run into > > > issues with >1024 cores or the number of jobs exceeding mem/1024M. > > > > But is that because it was never run on systems with >1024 core I don't want to hold up the patch for this, but I still not really understand why sched_getaffinity is used and what really happens for really large core counts. Wouldn't it be simpler to use get_nprocs () or sysconf (_SC_NPROCESSORS_ONLN)? > > or with low memory? > > This is tricky because per-thread memory usage is very hard to predict > since memory usage is not directly correlated with thread count. The > memory footprint across 64 threads could far less than 1GB, while 2 > threads in extreme cases might buffer many GB of output. > > I also have concerns about whether the memory that's available at the > time of check ends up being a reliable predictor of real per-thread > capacity, given OS-level memory overcommit and caching. > > The user does have the option to run eu-readelf in single thread > mode (--concurrency=1) which does not buffer output and reverts to > current memory usage patterns. OK, but it doesn't have to be 1GB, it could be 256M. It would be nice to have some kind of memory limit, but I guess it only really matters for huge core counts. > > If it is common code, maybe it should be part of threadlib that > > debuginfod then also uses? > > There are some difference between debuginfod default_concurrency > and eu-readelf's. Debuginfod checks std::thread::hardware_concurrency > which isn't available in eu-readelf, plus it uses a different estimate > for number of file descriptors per thread. These differences could be > parameterized but the common code is quite straightforward. In practice std::thread::hardware_concurrency is get_nprocs. I still think it would to have some common (parameterized, on fds, mem, etc) code here, because I don't think the code is that straightforward. But that could be a followup patch. Thanks, Mark