Re: [lldb-dev] RFC: Processor Trace Support in LLDB

2020-10-02 Thread Pavel Labath via lldb-dev
On 01/10/2020 22:32, Walter wrote:
> After a chat with Greg, we agreed on this set of commands
> 
> 
> trace load /path/to/json process trace start/stop process trace save
> /path/to/json thread trace start/stop thread trace dump [instructions |
> functions]
> 

Thanks. The new commands look good to me.


The multi-process trace concept is interesting. I don't question its
usefulness -- I am sure it can be useful for various kinds of analysis
(though I've never used that myself). I am wondering though about how to
represent this thing in lldb, as we don't really have anything close to
the concept of "debugging" all processes on a given system.

The only thing that comes close is probably the kernel-level debugging.
One idea (which has just occurred to me, so it may not be good) might be
to make these traces behave similarly to that. I.e., create a single
target/process with one "thread" per physical cpu, and then have a
special "os plugin" like thing which would present individual
process/threads.

That would have the advantage of maintaining the one trace-one target
invariant and also would preserve the information about relative timings
of individual "processes". I think that wuold be an interesting way to
view these things, but I don't know if it would be the best one...

pl
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Deadlock loading DWARF symbols

2020-10-02 Thread Dmitry Antipov via lldb-dev

I'm observing the following deadlock:

One thread calls Module::PreloadSymbols() which takes m_mutex of this Module. 
Module::PreloadSymbols()
calls ManualDWARFIndex::Index(), which, in turn, creates thread pool and waits 
for all threads completion:

(gdb)
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x7f67f176914c) at 
../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f67f17690c8, 
cond=0x7f67f1769120) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x7f67f1769120, mutex=0x7f67f17690c8) at 
pthread_cond_wait.c:638
#3  0x7f67f3974890 in 
std::condition_variable::wait(std::unique_lock&) () from 
/lib64/libstdc++.so.6
#4  0x7f67f4440c4b in 
std::condition_variable::wait > (__p=..., 
__lock=..., this=0x7f67f1769120)
at /usr/include/c++/10/condition_variable:108
#5  llvm::ThreadPool::wait (this=this@entry=0x7f67f1769060) at 
source/llvm/lib/Support/ThreadPool.cpp:72
#6  0x7f67fc6fa3a6 in lldb_private::ManualDWARFIndex::Index 
(this=0x7f66fe87e950)
at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:94
#7  0x7f67fc6b3825 in SymbolFileDWARF::PreloadSymbols (this=0x7f67de7af6f0) 
at /usr/include/c++/10/bits/unique_ptr.h:421
#8  0x7f67fc1ee488 in lldb_private::Module::PreloadSymbols 
(this=0x7f67de79b620) at source/lldb/source/Core/Module.cpp:1397
#9  0x7f67fc397a37 in lldb_private::Target::GetOrCreateModule 
(this=this@entry=0x96c7a0, module_spec=..., notify=notify@entry=true, 
error_ptr=error_ptr@entry=0x0)
at /usr/include/c++/10/bits/shared_ptr_base.h:1324
...

OTOH one of pool threads makes an attempt to lock Module's mutex:

(gdb) bt
#0  __lll_lock_wait (futex=futex@entry=0x7f67de79b638, private=0) at 
lowlevellock.c:52
#1  0x7f67fcd907f1 in __GI___pthread_mutex_lock (mutex=0x7f67de79b638) at 
../nptl/pthread_mutex_lock.c:115
#2  0x7f67fc1ed922 in __gthread_mutex_lock (__mutex=0x7f67de79b638) at 
/usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:749
#3  __gthread_recursive_mutex_lock (__mutex=0x7f67de79b638) at 
/usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:811
#4  std::recursive_mutex::lock (this=0x7f67de79b638) at 
/usr/include/c++/10/mutex:106
#5  std::lock_guard::lock_guard (__m=..., this=) at /usr/include/c++/10/bits/std_mutex.h:159
#6  lldb_private::Module::GetDescription (this=this@entry=0x7f67de79b620, 
s=..., level=level@entry=lldb::eDescriptionLevelBrief)
at source/lldb/source/Core/Module.cpp:1083
#7  0x7f67fc1f2070 in lldb_private::Module::ReportError (this=0x7f67de79b620, 
format=0x7f67fca03660 "DW_FORM_ref* DIE reference 0x%lx is outside of its CU")
at source/lldb/include/lldb/Utility/Stream.h:358
#8  0x7f67fc6adfb4 in DWARFFormValue::Reference 
(this=this@entry=0x7f66f5ff29c0) at 
/usr/include/c++/10/bits/shared_ptr_base.h:1324
#9  0x7f67fc6aaa77 in DWARFDebugInfoEntry::GetAttributes 
(this=this@entry=0x7f662e3580e0, cu=cu@entry=0x7f66ff6ebad0, attributes=...,
recurse=recurse@entry=DWARFBaseDIE::Recurse::yes, 
curr_depth=curr_depth@entry=0)
at source/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:439
#10 0x7f67fc6f8f8f in DWARFDebugInfoEntry::GetAttributes 
(recurse=DWARFBaseDIE::Recurse::yes, attrs=..., cu=0x7f66ff6ebad0, 
this=0x7f662e3580e0)
at source/lldb/source/./Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h:54
#11 lldb_private::ManualDWARFIndex::IndexUnitImpl (unit=..., 
cu_language=cu_language@entry=lldb::eLanguageTypeRust, set=...)
at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:180
#12 0x7f67fc6f96b7 in lldb_private::ManualDWARFIndex::IndexUnit 
(this=, unit=..., dwp=0x0, set=...)
at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:126
...

So this is a deadlock because thread pool is created with module lock held, and 
one (or more,
I'm observing two) pool thread(s) might want to grab the same lock to issue an 
error message.

Commenting out the whole body of Module::GetDescription() makes this deadlock 
disappear.

I'm not an expert in this area, but the whole thing looks like the Module 
object should have more
fine-granted locking rather than the only std::recursive_mutex for all 
synchronization purposes.

Dmitry
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] RFC: Processor Trace Support in LLDB

2020-10-02 Thread Greg Clayton via lldb-dev


> On Oct 2, 2020, at 3:51 AM, Pavel Labath  wrote:
> 
> On 01/10/2020 22:32, Walter wrote:
>> After a chat with Greg, we agreed on this set of commands
>> 
>> 
>> trace load /path/to/json process trace start/stop process trace save
>> /path/to/json thread trace start/stop thread trace dump [instructions |
>> functions]
>> 
> 
> Thanks. The new commands look good to me.

Great, we can move the "trace dump" over to "thread trace dump" for 
https://reviews.llvm.org/D86670  and keep that 
moving.

> The multi-process trace concept is interesting. I don't question its
> usefulness -- I am sure it can be useful for various kinds of analysis
> (though I've never used that myself). I am wondering though about how to
> represent this thing in lldb, as we don't really have anything close to
> the concept of "debugging" all processes on a given system.
> 
> The only thing that comes close is probably the kernel-level debugging.
> One idea (which has just occurred to me, so it may not be good) might be
> to make these traces behave similarly to that. I.e., create a single
> target/process with one "thread" per physical cpu, and then have a
> special "os plugin" like thing which would present individual
> process/threads.

I don't know enough about how trace data is stored or annotated after the raw 
data is pulled from the cores, but to make it useful it must be able to be 
associated with processes and threads somehow otherwise it would be just a 
bunch of addresses that would all overlap between many processes. 

 
> That would have the advantage of maintaining the one trace-one target
> invariant and also would preserve the information about relative timings
> of individual "processes". I think that wuold be an interesting way to
> view these things, but I don't know if it would be the best one...

I might suggest that each trace plug-in should do its best to represent 
processes and threads as separate entities so that they all remain separate. 
What ever data starts out as should be abstracted and I think I would rather 
see individual processes with their threads if that is possible to do, but I am 
just thinking of this with just a bit of knowledge tracing data. I think many 
chip makers create these trace formats and they are designed from a "trace a 
core" perspective, but if we can tame this data and present it as users would 
want to see it instead of trying to represent it as the data is stored, I think 
we will have a compelling trace feature in our debugger.

Greg___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Deadlock loading DWARF symbols

2020-10-02 Thread Greg Clayton via lldb-dev
Yes this is bad, and GetDescription() is used as a convenience to print out the 
module path (which might be a .o file within a .a file) and optionally 
architecture of the module. It probably shouldn't be taking the module lock as 
the only member variables that that GetDescription accesses are:

Module::m_arch
Module::m_file
Module::m_object_name

I would almost vote to take out the mutex lock in GetDescription() as the arch, 
file and name don't change after the module has been created. I am going to CC 
a few extra folks for discussion.

Anyone else have any objections to removing the mutex in GetDescription? Seems 
like this deadlock is easy to trigger if you have DWARF with errors or warnings 
inside of it.

Greg


> On Oct 2, 2020, at 6:50 AM, Dmitry Antipov via lldb-dev 
>  wrote:
> 
> I'm observing the following deadlock:
> 
> One thread calls Module::PreloadSymbols() which takes m_mutex of this Module. 
> Module::PreloadSymbols()
> calls ManualDWARFIndex::Index(), which, in turn, creates thread pool and 
> waits for all threads completion:
> 
> (gdb)
> #0  futex_wait_cancelable (private=0, expected=0, futex_word=0x7f67f176914c) 
> at ../sysdeps/nptl/futex-internal.h:183
> #1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f67f17690c8, 
> cond=0x7f67f1769120) at pthread_cond_wait.c:508
> #2  __pthread_cond_wait (cond=0x7f67f1769120, mutex=0x7f67f17690c8) at 
> pthread_cond_wait.c:638
> #3  0x7f67f3974890 in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /lib64/libstdc++.so.6
> #4  0x7f67f4440c4b in 
> std::condition_variable::wait > 
> (__p=..., __lock=..., this=0x7f67f1769120)
>at /usr/include/c++/10/condition_variable:108
> #5  llvm::ThreadPool::wait (this=this@entry=0x7f67f1769060) at 
> source/llvm/lib/Support/ThreadPool.cpp:72
> #6  0x7f67fc6fa3a6 in lldb_private::ManualDWARFIndex::Index 
> (this=0x7f66fe87e950)
>at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:94
> #7  0x7f67fc6b3825 in SymbolFileDWARF::PreloadSymbols 
> (this=0x7f67de7af6f0) at /usr/include/c++/10/bits/unique_ptr.h:421
> #8  0x7f67fc1ee488 in lldb_private::Module::PreloadSymbols 
> (this=0x7f67de79b620) at source/lldb/source/Core/Module.cpp:1397
> #9  0x7f67fc397a37 in lldb_private::Target::GetOrCreateModule 
> (this=this@entry=0x96c7a0, module_spec=..., notify=notify@entry=true, 
> error_ptr=error_ptr@entry=0x0)
>at /usr/include/c++/10/bits/shared_ptr_base.h:1324
> ...
> 
> OTOH one of pool threads makes an attempt to lock Module's mutex:
> 
> (gdb) bt
> #0  __lll_lock_wait (futex=futex@entry=0x7f67de79b638, private=0) at 
> lowlevellock.c:52
> #1  0x7f67fcd907f1 in __GI___pthread_mutex_lock (mutex=0x7f67de79b638) at 
> ../nptl/pthread_mutex_lock.c:115
> #2  0x7f67fc1ed922 in __gthread_mutex_lock (__mutex=0x7f67de79b638) at 
> /usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:749
> #3  __gthread_recursive_mutex_lock (__mutex=0x7f67de79b638) at 
> /usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:811
> #4  std::recursive_mutex::lock (this=0x7f67de79b638) at 
> /usr/include/c++/10/mutex:106
> #5  std::lock_guard::lock_guard (__m=..., 
> this=) at /usr/include/c++/10/bits/std_mutex.h:159
> #6  lldb_private::Module::GetDescription (this=this@entry=0x7f67de79b620, 
> s=..., level=level@entry=lldb::eDescriptionLevelBrief)
>at source/lldb/source/Core/Module.cpp:1083
> #7  0x7f67fc1f2070 in lldb_private::Module::ReportError 
> (this=0x7f67de79b620, format=0x7f67fca03660 "DW_FORM_ref* DIE reference 0x%lx 
> is outside of its CU")
>at source/lldb/include/lldb/Utility/Stream.h:358
> #8  0x7f67fc6adfb4 in DWARFFormValue::Reference 
> (this=this@entry=0x7f66f5ff29c0) at 
> /usr/include/c++/10/bits/shared_ptr_base.h:1324
> #9  0x7f67fc6aaa77 in DWARFDebugInfoEntry::GetAttributes 
> (this=this@entry=0x7f662e3580e0, cu=cu@entry=0x7f66ff6ebad0, attributes=...,
>recurse=recurse@entry=DWARFBaseDIE::Recurse::yes, 
> curr_depth=curr_depth@entry=0)
>at source/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:439
> #10 0x7f67fc6f8f8f in DWARFDebugInfoEntry::GetAttributes 
> (recurse=DWARFBaseDIE::Recurse::yes, attrs=..., cu=0x7f66ff6ebad0, 
> this=0x7f662e3580e0)
>at source/lldb/source/./Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h:54
> #11 lldb_private::ManualDWARFIndex::IndexUnitImpl (unit=..., 
> cu_language=cu_language@entry=lldb::eLanguageTypeRust, set=...)
>at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:180
> #12 0x7f67fc6f96b7 in lldb_private::ManualDWARFIndex::IndexUnit 
> (this=, unit=..., dwp=0x0, set=...)
>at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:126
> ...
> 
> So this is a deadlock because thread pool is created with module lock held, 
> and one (or more,
> I'm observing two) pool thread(s) might want to grab the same lock to issue 
> an error message.
> 
> Commenting out the whole body of Module::GetDescription() m