Re: [Lldb-commits] [PATCH] D93874: [process] fix exec support on Linux

Jim Ingham via lldb-commits Thu, 21 Jan 2021 18:04:36 -0800

The ThreadPlanStack always has one base thread plan.  You can't get rid of that 
one, the system relies on its always being there.  Despite it's name, 
DiscardThreadPlans doesn't actually discard all the thread plans, just all but 
the base thread plan...  So I think that's neither here nor there...


The thing I want to see is the sequence of the events "Make the new thread list 
after stopping for exec" and "Call ThreadPlan::GetThread()" the first time 
after the stop for the exec where we go from no cached thread to caching the 
thread from the old thread list.

The former is ProcessGDBRemote::UpdateThreadList.

You can figure out the latter by running the debugger on an lldb that is 
debugging a program that exec's.  Have the inferior lldb drive its inferior to 
the state just before the exec.  In the superior lldb, set a breakpoint on 
ThreadPlan::GetThread().  Keep continuing in the inferior lldb if you have to 
till you see it get the stop packet that corresponds to the exec.  At this 
point all the thread plans should have no cached threads, so they are going to 
have to look them up.  Keep going from there to the first time that 
ThreadPlan::GetThread is called and has to look up the thread with 
FindThreadByID call.  If I understand what you are describing, that first call 
should pull up the old thread pointer and cache it.  

If that happens before the call to ProcessGDBRemote::UpdateThreadList finishes 
that would be bad, and we have to stop it doing that.  If what's going on is 
that we set up the new thread list, broadcast the event, and then change the 
thread list again in response to detecting that this is an exec, that would 
also be bad.  We really should have a finalized thread list for the stop before 
we start trying to figure out what to do with the stop.


Jim


 

> On Jan 21, 2021, at 4:28 PM, Walter <a20012...@gmail.com> wrote:
> 
> I've tried to find a way to move the calls the way you mentioned, but it 
> doesn't seem trivial.
> 
> Some more information:
> - The invocation to the thread plan is done by Thread::ShouldStop, where it 
> does
> 
> ```
> // We're starting from the base plan, so just let it decide;
>     if (current_plan->IsBasePlan()) {
>       should_stop = current_plan->ShouldStop(event_ptr);
> ```
> 
> The interesting part of this call-stack starts at 
> Process::HandlePrivateEvent, which seems to be handling the Stop Event. Where 
> I think there's some inconsistent code is in ThreadPlanBase::ShouldStop, 
> because it tries to get the stop reason, which fails for Linux, and then it 
> processes eStopReasonExec a few lines below. And see what it does:
> 
> ```
>   case eStopReasonExec:
>       // If we crashed, discard thread plans and stop.  Don't force the
>       // discard, however, since on rerun the target may clean up this
>       // exception and continue normally from there.
>       LLDB_LOGF(
>           log,
>           "Base plan discarding thread plans for thread tid = 0x%4.4" PRIx64
>           " (exec.)",
>           m_tid);
>       GetThread().DiscardThreadPlans(false);
>       return true;
> ```
> 
> It does discard the thread plans for that thread! This makes me believe that 
> it should be fine to delete the thread plans in the first place. I wasn't 
> able to figure out more, but I can dig deeper if you give me a few pointers. 
> In any case, this last code of block makes me believe that deleting the 
> thread plans or reseting the thread pointer in the base thread plan might be 
> fine.
> 
> 
> Btw, this is a bunch of logs I got, which start at a breakpoint, then there's 
> a "continue", which triggers what I'm saying. The backtrace is below
> 
> (lldb) bt
> * thread #1, name = 'runner', stop reason = instruction step into
>   * frame #0: 0x00007f1fe674838d libc.so.6`raise + 61
>     frame #1: 0x0000000000400a3c runner`main(argc=2, argv=0x00007fff4b78fc08) 
> at runner.cpp:20
>     frame #2: 0x00007f1fe6734555 libc.so.6`__libc_start_main + 245
>     frame #3: 0x0000000000400919 runner`_start + 41
> (lldb) thread plan list /////////////////////////////////////// See that only 
> the base plan is there
> thread #1: tid = 0x2b72f1:
>   Active plan stack:
>     Element 0: Base thread plan.
>   Completed plan stack:
>     Element 0: Stepping one instruction past 0x00007f1fe6748387 stepping into 
> calls
>     Element 1: Stepping one instruction past 0x00007f1fe6748387 stepping into 
> calls
> (lldb) c
> lldb             Process::Resume -- locking run lock
> lldb             Process::PrivateResume() m_stop_id = 3, public state: 
> stopped private state: stopped
> lldb             WillResume Thread #1 (0x0x7fcd30000da0): tid = 0x2b72f1, pc 
> = 0x7f1fe674838d, sp = 0x7fff4b78fad8, fp = 0x7fff4b78fb20, plan = 'base 
> plan', state = running, stop others = 0
> lldb             Resuming thread: 2b72f1 with state: running.
> lldb             0x4e7020 Listener::Listener('gdb-remote.resume-packet-sent')
> lldb             0x4e7020 Listener::StartListeningForEvents (broadcaster = 
> 0x4787a8, mask = 0x00010000) acquired_mask = 0x00010000 for 
> gdb-remote.resume-packet-sent
> lldb             0x4e7020 Listener::StartListeningForEvents (broadcaster = 
> 0x478ff8, mask = 0x00000004) acquired_mask = 0x00000004 for 
> gdb-remote.resume-packet-sent
> lldb             0x494ab0 
> Broadcaster("lldb.process.gdb-remote.async-broadcaster")::BroadcastEvent 
> (event_sp = {0x7fcd40007cd0 Event: broadcaster = 0x478ff8 
> (lldb.process.gdb-remote.async-broadcaster), type = 0x00000001 (async thread 
> continue), data = {"c"}}, unique =0) hijack = (nil)
> lldb             0x494bf0 
> Listener('lldb.process.gdb-remote.async-listener')::AddEvent (event_sp = 
> {0x7fcd40007cd0})
> lldb             this = 0x00000000004E7020, timeout = 5000000 us for 
> gdb-remote.resume-packet-sent
> b-remote.async>  0x494bf0 'lldb.process.gdb-remote.async-listener' 
> Listener::FindNextEventInternal(broadcaster=(nil), 
> broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event 
> 0x7fcd40007cd0
> b-remote.async>  Process::SetPrivateState (running)
> b-remote.async>  0x483f30 
> Broadcaster("lldb.process.internal_state_broadcaster")::BroadcastEvent 
> (event_sp = {0x7fcd40005380 Event: broadcaster = 0x477dd0 
> (lldb.process.internal_state_broadcaster), type = 0x00000001, data = { 
> process = 0x477ce0 (pid = 2847473), state = running}}, unique =0) hijack = 
> (nil)
> b-remote.async>  0x4841b0 
> Listener('lldb.process.internal_state_listener')::AddEvent (event_sp = 
> {0x7fcd40005380})
> b-remote.async>  0x485cf0 Broadcaster("gdb-remote.client")::BroadcastEvent 
> (event_sp = {0x7fcd40005500 Event: broadcaster = 0x4787a8 
> (gdb-remote.client), type = 0x00010000, data = <NULL>}, unique =0) hijack = 
> (nil)
> b-remote.async>  0x4e7020 Listener('gdb-remote.resume-packet-sent')::AddEvent 
> (event_sp = {0x7fcd40005500})
> intern-state     0x4841b0 'lldb.process.internal_state_listener' 
> Listener::FindNextEventInternal(broadcaster=(nil), 
> broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event 
> 0x7fcd40005380
> lldb             0x4e7020 'gdb-remote.resume-packet-sent' 
> Listener::FindNextEventInternal(broadcaster=(nil), 
> broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event 
> 0x7fcd40005500
> intern-state     Current Plan for thread 1(0x7fcd30000da0) (0x2b72f1, 
> running): base plan being asked whether we should report run.
> lldb             0x4e7020 Listener::Clear('gdb-remote.resume-packet-sent')
> intern-state     Process::ShouldBroadcastEvent (0x7fcd40005380) => new state: 
> running, last broadcast state: running - YES
> lldb             0x4e7020 Listener::~Listener('gdb-remote.resume-packet-sent')
> intern-state     Process::HandlePrivateEvent (pid = 2847473) broadcasting new 
> state running (old state stopped) to public
> lldb             Process thinks the process has resumed.
> intern-state     Process::HandlePrivateEvent updated m_iohandler_sync to 3
> Process 2847473 resuming
> intern-state     0x483df0 Broadcaster("lldb.process")::BroadcastEvent 
> (event_sp = {0x7fcd40005380 Event: broadcaster = 0x477d18 (lldb.process), 
> type = 0x00000001 (state-changed), data = { process = 0x477ce0 (pid = 
> 2847473), state = running}}, unique =0) hijack = (nil)
> (lldb) intern-state     0x358b00 Listener('lldb.Debugger')::AddEvent 
> (event_sp = {0x7fcd40005380})
> intern-state     timeout = <infinite>, event_sp)...
> intern-state     this = 0x00000000004841B0, timeout = <infinite> for 
> lldb.process.internal_state_listener
> dbg.evt-handler  0x358b00 'lldb.Debugger' 
> Listener::FindNextEventInternal(broadcaster=(nil), 
> broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event 
> 0x7fcd40005380
> dbg.evt-handler  Process::SetPublicState (state = running, restarted = 0)
> dbg.evt-handler  this = 0x0000000000358B00, timeout = <infinite> for 
> lldb.Debugger
> 
> 
> /////////////////////////////////////////////////////////////// This is where 
> we start handling the stop, see the ~Thread call below
> b-remote.async>  this = 0x00007FCD30000DA0, pid = 2847473, tid = 2847473
> b-remote.async>  0x7fcd30000da0 Thread::~Thread(tid = 0x2b72f1)
> b-remote.async>  0x00007FCD30000DD8 Broadcaster::~Broadcaster("lldb.thread")
> b-remote.async>  Process::SetPrivateState (stopped)
> b-remote.async>  Process::SetPrivateState (stopped) stop_id = 4
> b-remote.async>  0x483f30 
> Broadcaster("lldb.process.internal_state_broadcaster")::BroadcastEvent 
> (event_sp = {0x7fcd3c0736f0 Event: broadcaster = 0x477dd0 
> (lldb.process.internal_state_broadcaster), type = 0x00000001, data = { 
> process = 0x477ce0 (pid = 2847473), state = stopped}}, unique =0) hijack = 
> (nil)
> b-remote.async>  0x4841b0 
> Listener('lldb.process.internal_state_listener')::AddEvent (event_sp = 
> {0x7fcd3c0736f0})
> b-remote.async>  this = 0x0000000000494BF0, timeout = <infinite> for 
> lldb.process.gdb-remote.async-listener
> intern-state     0x4841b0 'lldb.process.internal_state_listener' 
> Listener::FindNextEventInternal(broadcaster=(nil), 
> broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event 
> 0x7fcd3c0736f0
> intern-state     0x7fcd302ac330 
> Listener::Listener('Communication::SyncronizeWithReadThread')
> intern-state     0x7fcd302ac330 Listener::StartListeningForEvents 
> (broadcaster = 0x478228, mask = 0x00000020) acquired_mask = 0x00000020 for 
> Communication::SyncronizeWithReadThread
> intern-state     0x7fcd302ac330 
> Listener::Clear('Communication::SyncronizeWithReadThread')
> intern-state     0x7fcd302ac330 
> Listener::~Listener('Communication::SyncronizeWithReadThread')
> intern-state     0x00007FCD302BC738 Broadcaster::Broadcaster("lldb.thread")
> 
> ///////////////////////////////////////////////////// This is where the new 
> thread is created, but it points to the same tid
> intern-state     0x7fcd302bc700 Thread::Thread(tid = 0x2b72f1)       
> intern-state     0x358b00 Listener::StartListeningForEvents (broadcaster = 
> 0x7fcd302bc738, mask = 0x00000011) acquired_mask = 0x00000011 for 
> lldb.Debugger
> intern-state     this = 0x00007FCD302BC700, pid = 2847473, tid = 2847473
> intern-state     0x7fcd302bc700: tid = 0x2b72f1: stop info = <NULL> (stop_id 
> = 4)
> intern-state     0x7fcd302bc700: tid = 0x2b72f1: stop info = exec (stop_id = 
> 4)
> intern-state     
> intern-state     ThreadList::ShouldStop: 1 threads, 1 unsuspended threads
> intern-state     Thread::ShouldStop(0x7fcd302bc700) for tid = 0x2b72f1 
> 0x2b72f1, pc = 0x00007f3b751ce140
> intern-state     ^^^^^^^^ Thread::ShouldStop Begin ^^^^^^^^
> intern-state     Plan stack initial state:
>   thread #1: tid = 0x2b72f1:
>     Active plan stack:
>       Element 0: Base thread plan.
> 
> intern-state     Plan base plan explains stop, auto-continue 0.
> PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash 
> backtrace.
> Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH 
> or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
> ./bin/lldb[0x298aca]
> ./bin/lldb[0x298ca4]
> ./bin/lldb[0x296ca0]
> ./bin/lldb[0x29a606]
> /usr/local/fbcode/platform007/lib/libpthread.so.0(+0x12b80)[0x7fcd53167b80]
> /home/wallace/llvm-sand/build/Debug/Linux-x86_64/llvm/bin/../lib/liblldb.so.12(+0x47e5d96)[0x7fcd4c955d96]
> /home/wallace/llvm-sand/build/Debug/Linux-x86_64/llvm/bin/../lib/liblldb.so.12(+0x47e595a)[0x7fcd4c95595a]
> /home/wallace/llvm-sand/build/Debug/Linux-x86_64/llvm/bin/../lib/liblldb.so.12(+0x47cf23b)[0x7fcd4c93f23b]
> /home/wallace/llvm-sand/build/Debug/Linux-x86_64/llvm/bin/../lib/liblldb.so.12(+0x47e006c)[0x7fcd4c95006c]
> /home/wallace/llvm-sand/build/Debug/Linux-x86_64/llvm/bin/../lib/liblldb.so.12(+0x47253f5)[0x7fcd4c8953f5]
> /home/wallace/llvm-sand/build/Debug/Linux-x86_64/llvm/bin/../lib/liblldb.so.12(+0x4720f01)[0x7fcd4c890f01]
> /home/wallace/llvm-sand/build/Debug/Linux-x86_64/llvm/bin/../lib/liblldb.so.12(+0x472676e)[0x7fcd4c89676e]
> /home/wallace/llvm-sand/build/Debug/Linux-x86_64/llvm/bin/../lib/liblldb.so.12(+0x47257ed)[0x7fcd4c8957ed]
> /home/wallace/llvm-sand/build/Debug/Linux-x86_64/llvm/bin/../lib/liblldb.so.12(+0x45856ec)[0x7fcd4c6f56ec]
> /usr/local/fbcode/platform007/lib/libpthread.so.0(+0x76ce)[0x7fcd5315c6ce]
> /usr/local/fbcode/platform007/lib/libc.so.6(clone+0x3f)[0x7fcd46d6135f]
> Segmentation fault (core dumped)
> 
> 
> 
> //// This is the callstack I got from another run, but they are the same
> * thread #4, name = 'intern-state', stop reason = signal SIGSEGV: invalid 
> address (fault address: 0x218)
>   * frame #0: 0x00007f9983cccd96 
> liblldb.so.12`lldb_private::ThreadPlan::GetPrivateStopInfo(this=0x00007f99680014d0)
>  at ThreadPlan.h:521:24
>     frame #1: 0x00007f9983ccc95a 
> liblldb.so.12`lldb_private::ThreadPlanBase::ShouldStop(this=0x00007f99680014d0,
>  event_ptr=0x00007f997404c310) at ThreadPlanBase.cpp:78:29
>     frame #2: 0x00007f9983cb623b 
> liblldb.so.12`lldb_private::Thread::ShouldStop(this=0x00007f99682be2b0, 
> event_ptr=0x00007f997404c310) at Thread.cpp:871:35
>     frame #3: 0x00007f9983cc706c 
> liblldb.so.12`lldb_private::ThreadList::ShouldStop(this=0x0000000000477fa8, 
> event_ptr=0x00007f997404c310) at ThreadList.cpp:330:48
>     frame #4: 0x00007f9983c0c3f5 
> liblldb.so.12`lldb_private::Process::ShouldBroadcastEvent(this=0x0000000000477ce0,
>  event_ptr=0x00007f997404c310) at Process.cpp:3368:40
>     frame #5: 0x00007f9983c07f01 
> liblldb.so.12`lldb_private::Process::HandlePrivateEvent(this=0x0000000000477ce0,
>  event_sp=std::__shared_ptr<lldb_private::Event, 
> __gnu_cxx::_S_atomic>::element_type @ 0x00007f997404c310) at 
> Process.cpp:3593:33
>     frame #6: 0x00007f9983c0d76e 
> liblldb.so.12`lldb_private::Process::RunPrivateStateThread(this=0x0000000000477ce0,
>  is_secondary_thread=false) at Process.cpp:3787:7
>     frame #7: 0x00007f9983c0c7ed 
> liblldb.so.12`lldb_private::Process::PrivateStateThread(arg=0x0000000000494d80)
>  at Process.cpp:3685:25
>     frame #8: 0x00007f9983a6c6ec 
> liblldb.so.12`lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(arg=0x00000000004982e0)
>  at HostNativeThreadBase.cpp:68:10
>     frame #9: 0x00007f998a4d36ce 
> libpthread.so.0`start_thread(arg=0x00007f996f7fe700) at pthread_create.c:465:7
>     frame #10: 0x00007f997e0d835f libc.so.6`__clone at clone.S:95
> (lldb) ^D
> 
> Il giorno gio 21 gen 2021 alle ore 14:37 Jim Ingham <jing...@apple.com> ha 
> scritto:
> 
> 
> > On Jan 21, 2021, at 2:33 PM, Jim Ingham <jing...@apple.com> wrote:
> > 
> > 
> > 
> >> On Jan 21, 2021, at 12:51 PM, walter erquinigo via Phabricator 
> >> <revi...@reviews.llvm.org> wrote:
> >> 
> >> wallace added a comment.
> >> 
> >> Sorry for returning late to this diff, but I have some additional 
> >> information. This is what's happening:
> >> 
> >> - Before the exec, the base thread plan holds a reference to the thread 
> >> pointer
> >> - During the exec, the original thread is destroyed in 
> >> ProcessGDBRemote::SetLastStopPacket, and later a new Thread pointer is 
> >> created for the same tid. Notice that the base thread plan is still 
> >> holding a reference to the previous pointer even though there's a new 
> >> Thread pointer for the same tid. On Linux, exec preserves the tid. This 
> >> step is where I think there's a wrong invariant.
> >> - Then, at some point, the base thread plan is asked its stop reason, and 
> >> it'll try to use the destroyed pointer, thus crashing.
> > 
> > Thanks for looking at this some more.
> > 
> > Sadly, I think I'm still missing something.  Here's how I think things 
> > should happen:
> > 
> > 1) The program was stopped somewhere in the original program (before exec), 
> > and the user continued the process
> >   a) WillResume is called and all the thread plans clear their cached 
> > thread pointers.  So they will have to look them up by ID when next asked a 
> > question.
> >   b) Then the process resumes.
> > 2) We get a stop packet with a reason of "exec" from the stub.
> >   a) We process the stop packet, and get the new thread list
> >   b) We start to build the new thread list
> 
> Ack, that was confusing.  Item b) should be "We start to build the new 
> ThreadList".  The object in a) is the thread list the stub tells about, 
> either in the stop packet or through subsequent queries.  The object in b) is 
> the ThreadList in the Process that will describe the state at this particular 
> stop point.
> 
> >      1) From what you are saying, in the case of exec, regardless of 
> > whether the TID matches an extant one or not, we make a new Thread *, put 
> > it into the new ThreadList and delete the old Thread *.
> >   c) Then we start figuring out what to do after the stop.
> >      i) If the TID of the new thread is the same as any of the TID's in the 
> > old thread list (and thus of the ThreadPlanStack that was managing it) when 
> > the new thread gets asked what to do, it will route the question to the 
> > ThreadPlanStack for that TID.
> >      ii) If this happens after the new thread list is in place, 
> > FindThreadByID will return the new, valid thread, and there should be no 
> > problem.
> > 
> > So for this to fail as you described, somewhere between the time that we 
> > stopped for exec and the time we finalized the new thread list and 
> > discarded the old threads, somebody must have asked the thread plan stack 
> > for that TID a question that triggered it to cache the old thread's Thread 
> > *.
> > 
> > That really shouldn't happen, precisely for the reason you see here.  When 
> > we stop, we have to finish setting up the current ThreadList before we 
> > start asking questions of the ThreadPlanStack for threads, like ShouldStop 
> > or whatever.  It makes no sense to ask the pre-stop thread list questions 
> > for the current stop, and we shouldn't do it.
> > 
> > Do you know where we're calling ThreadPlan::GetThread while the thread list 
> > still has the old threads in it?  I'd really like to stop that if we can.  
> > And if you can move those queries till after the new thread list is 
> > established, then we shouldn't need any of the alternatives you suggest.
> > 
> > About the alternatives:
> > 
> > First one: I'd rather not artificially clear out the ThreadPlanStack after 
> > exec.  After all, if you knew that exec on Linux preserves the TID, you 
> > could write a thread plan that carried over the exec and did something 
> > interesting.  Be a shame to prevent that for implementation reasons.
> > 
> > Second one: Adding the IsValid check won't do any harm, but it implicitly 
> > lets something happen that I don't think should happen, so it doesn't feel 
> > right...
> > 
> > Third one: If there's some good reason why you have to first check through 
> > the old thread list for stop reasons, then update the thread list, then ask 
> > again with no Resume in between, I think the cleanest solution is to 
> > announce that by explicitly clearing the ThreadPlans' cached thread 
> > pointer.  So I think this one's the best option.  But again, I'd like to 
> > understand why it was necessary.
> > 
> > Jim
> > 
> > 
> >> 
> >> I've found three possible fixes for this (I'm pretty sure that you can 
> >> come up with more):
> >> 
> >> - The first one is what this diff does, which is clearing all the thread 
> >> plans once the gdbremote client knows there's an exec.
> >> - The second one is modifying the ThreadPlan::GetThread method, which 
> >> currently looks like this
> >> 
> >> if (m_thread)
> >>     return *m_thread;
> >> 
> >>   ThreadSP thread_sp = m_process.GetThreadList().FindThreadByID(m_tid);
> >>   m_thread = thread_sp.get();
> >>   return *m_thread;
> >> 
> >> Changing it into
> >> 
> >> if (m_thread && m_thread->IsValid()) // IsValid will return false for a 
> >> deleted thread
> >>     return *m_thread;
> >> 
> >>   ThreadSP thread_sp = m_process.GetThreadList().FindThreadByID(m_tid);
> >>   m_thread = thread_sp.get();
> >>   return *m_thread;
> >> 
> >> also works, as it will make the ThreadPlan grab the new thread pointer.
> >> 
> >> - Another solution is to create a method ThreadPlan::PrepareForExec() 
> >> which just clears the m_thread pointer in the the thread plan, and call it 
> >> from ProcessGDBRemote::SetLastStopPacket, where we handle the exec.
> >> 
> >> I prefer the first solution, as the thread plans are all cleared in 
> >> preparation for the new process.
> >> 
> >> 
> >> Repository:
> >> rG LLVM Github Monorepo
> >> 
> >> CHANGES SINCE LAST ACTION
> >> https://reviews.llvm.org/D93874/new/
> >> 
> >> https://reviews.llvm.org/D93874
> >> 
> > 
> 
> 
> 
> -- 
> - Walter Erquínigo Pezo
> 

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Re: [Lldb-commits] [PATCH] D93874: [process] fix exec support on Linux

Reply via email to