On 2022-04-13 19:48, Alexey Izbyshev wrote:
On 2022-04-11 13:10, Alexey Izbyshev wrote:
What's probably not normal is the behavior of the hanging conhost.exe.
I've compared the points where conhost.exe is blocked, and all but one
threads in the model case are doing the same things as in the hanging
case, but the remaining thread is blocked in
ReadFile("\Device\NamedPipe\") (i.e. the read end of "hWritePipe" of
pcon) instead of trying to enter a critical section like thread 1
above. So now I'm starting to doubt that it's a cygwin bug and not
some conhost.exe bug.

I'll try to poke around the hanging conhost.exe some more, and also
may be will try to create a faster reproducer.

I've studied conhost.exe hang, and it indeed looks like it's buggy.

TLDR: https://github.com/microsoft/terminal/pull/12181

The full story:

I dumped conhost.exe, opened the dump in windbg and looked at the stack trace of the hanging thread:

ntdll!NtWaitForAlertByThreadId+0x14
ntdll!RtlpWaitOnAddressWithTimeout+0x81
ntdll!RtlpWaitOnAddress+0xae
ntdll!RtlpWaitOnCriticalSection+0xfd
ntdll!RtlpEnterCriticalSectionContended+0x1c4
ntdll!RtlEnterCriticalSection+0x42
conhost!Microsoft::Console::Render::Renderer::_PaintFrameForEngine+0x54
conhost!Microsoft::Console::Render::Renderer::TriggerTeardown+0x19e60
conhost!Microsoft::Console::Interactivity::ServiceLocator::RundownAndExit+0x21
conhost!Microsoft::Console::PtySignalInputThread::_GetData+0x65
conhost!Microsoft::Console::PtySignalInputThread::_InputThread+0x25
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21

By looking at assembly, I've found that it hangs *after* ReadFile() on the pipe completes, so the problem is definitely not a leak of hWritePipe in bash.exe or elsewhere.

Using the function names, I've found this issue: https://github.com/microsoft/terminal/issues/1810.

This is a different one, but the discussion and the patch shows that synchronization on startup/shutdown is a disaster.

Then I looked at the code and identified that hang happens while attempting to lock the console at [1]. After studying how this lock is used in other parts of the code, I noticed that PtySignalInputThread::_Shutdown() (which is further up in the call stack of the hanging function) uses ProcessCtrlEvents() incorrectly, because the latter unconditionally unlocks the console, but the lock is never taken by this thread at this point. Then I looked at a more recent version of the code and discovered the patch to _Shutdown() which I referenced above.

I've also verified that assembly of _Shutdown() (which is inlined into PtySignalInputThread::_GetData()) corresponds to the unpatched version (i.e. without LockConsole() call):

call    conhost!CloseConsoleProcessState (00007ff6`22e7013c)
call    conhost!ProcessCtrlEvents (00007ff6`22e262a0)
mov     ecx,6Dh
call conhost!Microsoft::Console::Interactivity::ServiceLocator::RundownAndExit (00007ff6`22e3c730)

I'm not sure why this bug is not triggered more frequently, but one possible reason, as indicated by comment [2], is that the bad path is only taken if there are live clients after ClosePseudoConsole() is called, which is probably rare.

A potential workaround on Cygwin side would be to ensure that the pseudoconsole doesn't have clients before calling ClosePseudoConsole(), but I don't know whether it's possible.

[1] https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/renderer/base/renderer.cpp#L75

[2] https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/host/PtySignalInputThread.cpp#L205

Alexey

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to