https://github.com/jasonmolenda created https://github.com/llvm/llvm-project/pull/132128
This fixes an uncommon bug with debugserver controlling an inferior process that is hitting an internal breakpoint & continuing when multiple interrupts are sent by SB API to lldb. In the reproducing setup (which required a machine with unique timing characteristics), lldb is sent SBProcess::Stop and then shortly after, SBProcess::SendAsyncSignal. The driver process only sees that the inferior is publicly running at this point, even though it's hitting an internal breakpoint, disabling it, step instructioning, re-enabling the breakpoint, then continuing. The packet sequence lldb sends to debugserver looks like 1. vCont;s // instruction step 2. ^c // async interrupt 3. Z.... // re-enable breakpoint 4. c // resume inferior execution 5. ^c // async interrupt When debugserver needs to interrupt a running process (`MachProcess::Interrupt`), the main thread in debugserver sends a SIGSTOP posix signal to the inferior process, and notes that it has sent this signal by setting `m_sent_interrupt_signo`. When we send the first async interrupt while instruction stepping, the signal is sent (probably after the inferior has already stopped) but lldb can only *receive* the mach exception that includes the SIGSTOP when the process is running. So at the point of step (3), we have a SIGSTOP outstanding in the kernel, and `m_sent_interrupt_signo` is set to SIGSTOP. When we resume the inferior (`c` in step 4), debugserver sees that `m_sent_interrupt_signo` is still set for an outstanding SIGSTOP, but at this point we've already stopped so it's an unnecessary stop. It records that (1) we've got a SIGSTOP still coming that debugserver sent and (2) we should ignore it by also setting `m_auto_resume_signo` to the same signal value. Once we've resumed the process, the mach exception thread (`MachTask::ExceptionThread`) receives the outstanding mach exception, adds it to a queue to be processed (`MachProcess::ExceptionMessageReceived`) and when we've collected all outstanding mach exceptions, it calls `MachProcess::ExceptionMessageBundleComplete` top evaluate them. `MachProcess::ExceptionMessageBundleComplete` halts the process (without updating the MachProcess `m_state`) while evaluating them. It sees that this incoming SIGSTOP was meant to be ignored (`m_auto_resume_signo` is set), so it `MachProcess::PrivateResume`'s the process again. At the same time `MachTask::ExceptionThread` is receiving and processing the ME, `MachProcess::Interrupt` is called with another interrupt that debugserver received. This method checks that we're still eStateRunning (we are) but then sees that we have an outstanding SIGSTOP already (`m_sent_interrupt_signo`) and does nothing, assuming that we will stop shortly from that one. It then returns to call `RNBRemote::HandlePacket_last_signal` to print the status -- but because the process is still `eStateRunning`, this does nothing. So the first ^c (resulting in a pending SIGSTOP) is received and we resume the process silently. And the second ^c is ignored because we've got one interrupt already being processed. The fix was very simple. In `MachProcess::Interrupt` when we detect that we have a SIGSTOP out in the wild (`m_sent_interrupt_signo`), we need to clear `m_auto_resume_signo` which is used to indicate that this SIGSTOP is meant to be ignored, because it was from before our most recent resume. MachProcess::Interrupt holds the `m_exception_and_signal_mutex` mutex already (after Jonas's commit last week), and all of `MachProcess::ExceptionMessageBundleComplete` holds that same mutex, so we know we can modify `m_auto_resume_signo` here and it will be handled correctly when the outstanding mach exception is finally processed. rdar://145872120 >From decb5f5ed5a8e38c726d930a00dc1f2cb9a277bf Mon Sep 17 00:00:00 2001 From: Jason Molenda <jmole...@apple.com> Date: Wed, 19 Mar 2025 17:36:06 -0700 Subject: [PATCH] [lldb][debugserver] Interrupt should reset outstanding SIGSTOP This fixes an uncommon bug with debugserver controlling an inferior process that is hitting an internal breakpoint & continuing when multiple interrupts are sent by SB API to lldb. In the reproducing setup (which required a machine with unique timing characteristics), lldb is sent SBProcess::Stop and then shortly after, SBProcess::SendAsyncSignal. The driver process only sees that the inferior is publicly running at this point, even though it's hitting an internal breakpoint, disabling it, step instructioning, re-enabling the breakpoint, then continuing. The packet sequence lldb sends to debugserver looks like 1. vCont;s // instruction step 2. ^c // async interrupt 3. Z.... // re-enable breakpoint 4. c // resume inferior execution 5. ^c // async interrupt When debugserver needs to interrupt a running process (`MachProcess::Interrupt`), the main thread in debugserver sends a SIGSTOP posix signal to the inferior process, and notes that it has sent this signal by setting `m_sent_interrupt_signo`. When we send the first async interrupt while instruction stepping, the signal is sent (probably after the inferior has already stopped) but lldb can only *receive* the mach exception that includes the SIGSTOP when the process is running. So at the point of step (3), we have a SIGSTOP outstanding in the kernel, and `m_sent_interrupt_signo` is set to SIGSTOP. When we resume the inferior (`c` in step 4), debugserver sees that `m_sent_interrupt_signo` is still set for an outstanding SIGSTOP, but at this point we've already stopped so it's an unnecessary stop. It records that (1) we've got a SIGSTOP still coming that debugserver sent and (2) we should ignore it by also setting `m_auto_resume_signo` to the same signal value. Once we've resumed the process, the mach exception thread (`MachTask::ExceptionThread`) receives the outstanding mach exception, adds it to a queue to be processed (`MachProcess::ExceptionMessageReceived`) and when we've collected all outstanding mach exceptions, it calls `MachProcess::ExceptionMessageBundleComplete` top evaluate them. `MachProcess::ExceptionMessageBundleComplete` halts the process (without updating the MachProcess `m_state`) while evaluating them. It sees that this incoming SIGSTOP was meant to be ignored (`m_auto_resume_signo` is set), so it `MachProcess::PrivateResume`'s the process again. At the same time `MachTask::ExceptionThread` is receiving and processing the ME, `MachProcess::Interrupt` is called with another interrupt that debugserver received. This method checks that we're still eStateRunning (we are) but then sees that we have an outstanding SIGSTOP already (`m_sent_interrupt_signo`) and does nothing, assuming that we will stop shortly from that one. It then returns to call `RNBRemote::HandlePacket_last_signal` to print the status -- but because the process is still `eStateRunning`, this does nothing. So the first ^c (resulting in a pending SIGSTOP) is received and we resume the process silently. And the second ^c is ignored because we've got one interrupt already being processed. The fix was very simple. In `MachProcess::Interrupt` when we detect that we have a SIGSTOP out in the wild (`m_sent_interrupt_signo`), we need to clear `m_auto_resume_signo` which is used to indicate that this SIGSTOP is meant to be ignored, because it was from before our most recent resume. MachProcess::Interrupt holds the `m_exception_and_signal_mutex` mutex already (after Jonas's commit last week), and all of `MachProcess::ExceptionMessageBundleComplete` holds that same mutex, so we know we can modify `m_auto_resume_signo` here and it will be handled correctly when the outstanding mach exception is finally processed. rdar://145872120 --- lldb/tools/debugserver/source/MacOSX/MachProcess.mm | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/lldb/tools/debugserver/source/MacOSX/MachProcess.mm b/lldb/tools/debugserver/source/MacOSX/MachProcess.mm index 4742beeb6a4a3..447ebbe7fb9e5 100644 --- a/lldb/tools/debugserver/source/MacOSX/MachProcess.mm +++ b/lldb/tools/debugserver/source/MacOSX/MachProcess.mm @@ -1594,6 +1594,10 @@ static uint64_t bits(uint64_t value, uint32_t msbit, uint32_t lsbit) { m_sent_interrupt_signo); } } else { + // We've requested that the process stop anew; if we had recorded this + // requested stop as being in place when we resumed (& therefore would + // throw it away), clear that. + m_auto_resume_signo = 0; DNBLogThreadedIf(LOG_PROCESS, "MachProcess::Interrupt() - previously " "sent an interrupt signal %i that hasn't " "been received yet, interrupt aborted", _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits