https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744

--- Comment #25 from Gleb Natapov <gleb at scylladb dot com> ---
(In reply to Jakub Jelinek from comment #24)
> (In reply to Gleb Natapov from comment #23)
> > I am not sure I agree. 64 lock will take one page of memory, which is
> > negligible amount nowadays and we can drop the array if compiled for single
> > threaded machine. 
> 
> It is perhaps negligible for your app, but libc has lots of users with
> different needs.  And dl_load_lock isn't the only widely used lock in libc,
> are we going to use page of array locks in each case such lock has
> scalability issues with certain use cases?
>
That's a fair point. I think severity of the issue should be taken into
account. I can tell from our experience (and searching the web we are not
alone) that for exception throwing languages like C++ the issue is very
serious, and no, we do not use exceptions as flow control, but when errors
happen they tend to happen in bunches and when the first bunch slows the system
to a crawl it causes even more errors. The only workaround is to not use
exception which for us is not acceptable, so fixing the issue in its root is
the only option.

Using Torvald's rwlock would be definitely better that current state, but not
as good as per thread lock.

> 
> > Such interface will make new dl_iterate_phdr_rd to libgcc specific, also
> > scalablity will depend on cache efficiency, so while benchmark will show
> > much better result, real application will not benefit. Complex C++
> > applications tend to have deep call chains.
> 
> Why would it be libgcc specific?  It would be another libc supported API,
> with clear documentation on what it does and any user could just use it.
>  
I think I misunderstood what you propose. My patch essentially does what you
suggest already, it calls the function dl_iterate_phdr_parallel instead of
dl_iterate_phdr_rd, but otherwise it is the same: it can run multiple callback
in parallel, so we only disagree on how _parallel_ part is achieved internally.
On glibc list there were some concerns about widening the interface though.
They may prefer to use symbol versioning to change dl_iterate_phdr semantics
(not sure if and how this can be done yet).


>                                                                        As
> I said, the number of entries in the cache could be extended.
>
Unless it extends dynamically it would be hard to guess a proper size, and the
price of underguessing is too high. Finding a proper size dynamically
will require a lot of cache management code which I do not think belong here.

Reply via email to