labath added a comment.

In D122975#3426731 <https://reviews.llvm.org/D122975#3426731>, @llunak wrote:

> In D122975#3426575 <https://reviews.llvm.org/D122975#3426575>, @labath wrote:
>
>> I'd like to understand what is the precise thing that you're trying to 
>> optimise. If there is like a single hot piece of code that we want to 
>> optimise, then we might be able to come up with an different approach (a'la 
>> PrefetchModuleSpecs) to achieve the same thing.
>
> The scenario is a repeated lldb start with index cache enabled (and so 
> everything necessary cached), the debugged app is debug build of LibreOffice 
> Calc (~100 medium-sized C++ libraries). 90% of the CPU time is spent in 
> LoadFromCache() calls, because 70% of the CPU time is spent in ConstString 
> (and that's after my ConstString optimizations). I don't think there's any 
> other way to make it faster other than parallelizing it, although 
> parallelizing only LoadFromCache() should be enough and I'd expect that to be 
> limited enough to be safe if you expect the lldb codebase is not generally 
> thread-safe. In this patch I parallelized higher because it was simple to do 
> it up there, and I don't know how to easily get at all the LoadFromCache() 
> calls (especially the one called from SymbolFileDWARFDebugMap is relatively 
> deep). If you know how to parallelize only that, that works for me too.

OK, got it. So, for this case, I think the best approach would be to extract 
and paralelize the `PreloadSymbols` calls. They are not deep (=> relatively 
easy to extract), optional (they are just a performance optimization, so 
nothing will break if they're skipped), and completely isolated (they only 
access data from the single module).

In fact we already have a sort of an existing place to do this kind of group 
module actions. The reason that the `ModulesDidLoad` (line 621) call does not 
happen inside `GetOrCreateModule` is because we want to send just one load 
event instead of spamming the user with potentially hundreds of messages. I 
don't think it would be unreasonable to move the `PreloadSymbols` call from to 
`ModulesDidLoad`.

Of course, we still need to figure out the parallelization story.

In D122975#3427243 <https://reviews.llvm.org/D122975#3427243>, @llunak wrote:

> In D122975#3427008 <https://reviews.llvm.org/D122975#3427008>, @clayborg 
> wrote:
>
>> I had tried something similar with the thread pools when trying to 
>> parallelize similar stuff. The solution I made was to have a global thread 
>> pool for the entire LLDB process, but then the LLVM thread pool stuff needed 
>> to be modified to handle different groups of threads where work could be 
>> added to a queue and then users can wait on the queue. The queues then need 
>> to be managed by the thread pool code. Queues could also be serial queues or 
>> concurrent queues. I never completed the patch, but just wanted to pass 
>> along the ideas I had used. So instead of adding everything to a separate 
>> pool, the main thread pool could take queues. The code for your code above 
>> would look something like:
>
> I got a similar idea too, but I think it wouldn't work here in this 
> threadpool-within-threadpool situation. If you limit the total number of 
> threads, then the "outer" tasks could allocate all the threads and would 
> deadlock on "inner" tasks waiting for threads (and if you don't limit threads 
> then there's no need to bother). That's why I came up with the semaphore 
> idea, as that would require threads to acquire slots and "outer" tasks could 
> temporarily release them when spawning "inner" tasks (I didn't consider 
> pending threads to be a problem, but if it is, then the supposed ThreadPool 
> queues presumably could do that somehow too).

Pending/suspended threads are less of a problem then threads actively 
contending for cpu time, but still less than ideal. On my machine I could end 
up with over 2k threads. At 8MB per thread, that's 16GB just for stack (most of 
it probably unused, but still...). And some machines have more (sometimes a lot 
more) CPUs than I do.

Properly implementing thread pools is tricky, and I don't consider myself an 
expert, but I think that, instead of using semaphores, you could detect that 
the case when `wait()` is being called from inside a thread pool thread, and 
then, instead of passively waiting for the task to finish, start eagerly 
evaluating it on the same thread.

I'm pretty sure I didn't come up with this idea, so it's possible that 
something like this is already implemented in llvm, and I got the idea from 
there.

> But if lldb code is not considered thread-safe to parallelize the module 
> loading at this level, then this is all irrelevant. If it's required to 
> parallelize only cache loading, then that removes the 
> threadpool-within-threadpool situation. I don't know if I understand the code 
> enough to try that though.

Yeah, I'd rather avoid parallelizing the logic for populating the target's 
module and load lists. Even if the result would be always correct, just the 
mere fact that the module lists (and the output of e.g. `image list`) would be 
in nondeterministic order is not ideal.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122975/new/

https://reviews.llvm.org/D122975

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to