On 2022-04-10 15:13, Alexey Izbyshev wrote:
On 2022-04-10 10:34, Takashi Yano wrote:
On Sat, 09 Apr 2022 23:26:51 +0300
Thanks for investigating. In the normal case, conhost.exe is
terminated
when hWritePipe is closed.
Thanks for confirming.
Possibly, the hWritePipe has incorrect handle value.
I've verified that the handle was correct by attaching via gdb to the
hanging bash and checking that hWritePipe field is now zeroed (which
happens only in the branch where _HandleIsValid returns true and
hWritePipe is closed).
I've found something interesting though. I've modeled a similar
situation on another machine:
1. I've run a native process via bash.
2. I've attached to bash via gdb and set a breakpoint on
ClosePseudoConsole().
3. I've killed the native process.
4. The breakpoint was hit, and I looked at hWritePipe value.
ProcessHacker shows it as "Unnamed file: \FileSystem\Npfs". Both bash
and conhost had a single handle with such name, and after I've
forcibly closed it in the bash process (while it was still suspended
by gdb), conhost.exe indeed died.
Then I looked at the original hanging tree and found that the hanging
bash.exe still has a single handle displayed as "Unnamed file:
\FileSystem\Npfs". I don't know how to check what kernel object it
refers to, but at least its access rights are the same as for
hWritePipe that I've seen on another machine, and its handle count is
1. So could it be another copy of hWritePipe, e.g. due to some handle
leak?
I don't know how to verify whether this suspicious handle in bash.exe
is paired with "Unnamed file: \FileSystem\Npfs" in conhost.exe, other
than by forcibly closing it. If I close it and conhost.exe dies, it
will confirm "the extra handle" theory, but will also prevent further
investigation with the hanging tree. Do you have any advice?
I've found something that looked strange to me by checking handles in
the hanging process tree: the hanging conhost.exe and the hanging
bash.exe belong to different tests. Each test is a separate shell script
in a separate make recipe, so it looks like conhost.exe was created by
one test (which is still hanging at a later point in its script, trying
to run grep), but then bash.exe belonging to another test somehow got a
pseudoconsole referring to this conhost.exe and now hangs trying to
close it. So it looks that Cygwin migrated the pseudoconsole between
processes, and indeed fhandler_pty_slave::close_pseudoconsole() contains
something looking like migration logic. And this logic contains the
following call:
DuplicateHandle (GetCurrentProcess (),
ttyp->h_pcon_write_pipe,
new_owner, &new_write_pipe,
0, TRUE, DUPLICATE_SAME_ACCESS);
Is it safe to create an *inheritable* handle in another process here?
Could it be that the target process spawns a child at the wrong moment
(e.g. before it even knows about the newly created handle), and that
handle unintentionally leaks into the child, triggering the hang
afterwards?
A similarly suspicious code is also in
fhandler_pty_common::resize_pseudo_console():
DuplicateHandle (pcon_owner, get_ttyp ()->h_pcon_write_pipe,
GetCurrentProcess (), &hpcon_local.hWritePipe,
0, TRUE, DUPLICATE_SAME_ACCESS);
ResizePseudoConsole ((HPCON) &hpcon_local, size);
CloseHandle (pcon_owner);
CloseHandle (hpcon_local.hWritePipe);
If another thread spawns a child using
CreateProcess(bInheritHandles=TRUE) between DuplicateHandle() and
CloseHandle(hpcon_local.hWritePipe), the handle will leak into the
child.
Sorry if this is a false lead, I haven't tried to really understand the
pseudoconsole-related code yet.
Thanks,
Alexey
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple