[Lldb-commits] [lldb] [lldb] Fix race condition in Process::WaitForProcessToStop() (PR #144919)

via lldb-commits Mon, 23 Jun 2025 14:32:20 -0700

jimingham wrote:

Your solution is going in the right direction, but is trying to solve a problem 
"after the fact" that we can solve better by preventing the public stop events 
from making their way to the non-primary listener before the primary listener 
gets a chance to fully process them.


Process event listeners are a bit special because we have to keep the "process 
event state" in sync with the value returned from Process::GetState.  It would 
be hard to figure out the meaning of events if for instance you could still 
have the "stopped" event sitting unfetched in the event queue, but the process 
state were shown as stopped.  To ensure this, we designate one of the listeners 
(the one passed into the Process constructor) as the "primary process event 
listener".  That one gets sent the event first, and we're supposed to only 
change the process state when the primary listener has finished handling the 
event.  Note, while a listener that hijacks the process event broadcaster gets 
designated as the primary event listener while it is hijacking the broadcaster, 
the converse is not true, the primary listener does not have to be hijacking 
the process event stream.  In fact, it's usually just the listener you passed 
in when you created the Process.

So why doesn't that all just work?

The problem is that when lldb goes to run the DoOnRemoval callback for the 
Process event (that's what calls your python breakpoint code, for instance), 
the code needs to see the process state as stopped - so that any commands will 
execute in the right context.  Currently, the only way to do that is to switch 
the public state to stopped.  But as you correctly analyzed, if you set the 
public state to stopped while we're in the middle of handling the stop, then 
any other code that's waiting on the state change will trigger, possibly before 
the breakpoint has decided whether to stop or not, which is incorrect.  To 
really honor the contract above, we need to make sure the stop event doesn't 
get delivered to the non-primary listeners till the stop is fully handled.

We already have code to deal with a slightly different aspect of this.  

The other reason that process events are a bit special is that we know that to 
implement a "user level operation" like source line stepping, the debugger 
often has to stop and start numerous times.  Those stops have no meaning at the 
user level, so we want to suppress them.  The way do that is to have a two 
stage handling of process events and their corresponding "state".  First the 
"private" side of lldb fetches the event (usually on the private state thread), 
and sets the "private state" to stopped without changing the "public state".  
Then we run the thread plan logic is run to decide whether we are indeed going 
to stop to the user.  Then we rebroadcast the event to the primary process 
listener - and as a result set the "public state" to stopped.  But the code 
that runs the "Thread Plans" on the private state thread needs to see the state 
as stopped.

We implement this by making Process::GetState smart about who is using it.  If 
you look at Process::GetState, you will see it doesn't just return the public 
state.  That's because all the thread plan logic has to run on the Private 
State Thread before it is appropriate to set the public state to stopped.  
Instead, when code running on the "Private State Thread" asks for the process 
state, it will get the private state.  But code running on any other thread 
will still see the public state. That's done by checking 
"CurrentThreadIsPrivateStateThread" in Process::GetState.

I think the correct solution to this problem is to add the thread that's going 
to run DoOnRemoval for the primary process event listener to the private state 
thread so that both of them will see the private state in Process::GetState.  
Then we only flip the public state when DoOnRemoval is all the way done .  That 
way - like with the private event handling - the code doing the handling will 
see the right process state, so it will function correctly.  And then only when 
the callback has completed will the public state get switched (causing the 
event to be broadcast to the other clients waiting on the change in public 
state.)

That way the thread that is running the breakpoint command will run 
uninterrupted but in the correct context without any of the other threads 
waiting on the process knowing about the stop too early.

And that way the non-primary process event listeners will be purely advisory as 
they are supposed to be.

https://github.com/llvm/llvm-project/pull/144919
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [lldb] [lldb] Fix race condition in Process::WaitForProcessToStop() (PR #144919)

Reply via email to