On 5/16/2025 4:59 AM, Takashi Yano via Cygwin wrote:
On Fri, 16 May 2025 08:46:40 +0900
Takashi Yano wrote:
diff --git a/winsup/cygwin/local_includes/cygheap.h 
b/winsup/cygwin/local_includes/cygheap.h
index fed87ec2b..7d11fbb37 100644
--- a/winsup/cygwin/local_includes/cygheap.h
+++ b/winsup/cygwin/local_includes/cygheap.h
@@ -604,6 +604,8 @@ class cygheap_fdnew : public cygheap_fdmanip
    {
      if (cygheap->fdtab[fd])
        cygheap->fdtab[fd]->inc_refcnt ();
+    if (locked)
+      cygheap->fdtab.unlock ();
    }
    void operator = (fhandler_base *fh) {cygheap->fdtab[fd] = fh;}
  };

This should not be done, because the parent class cygheap_fdmanip
does that.
Right. But the other part of the patch (to syscalls.cc) looks right to me, and I agree that it fixes the hang. Here's my understanding of why it works: The main thread tries to open the fifo for reading, but fhandler_fifo::open blocks until it detects that someone is opening the fifo for writing. The other thread wants to do that, but it never gets to the point of calling fhandler_fifo::open because it is stuck waiting for the lock on cygheap->fdtab. To fix this, we need to delay the construction of the cygheap_fdnew object fd until after fhandler_fifo::open has been called.

Do you agree with this explanation, or is there something else going on? In either case, I think it would be good to include at least a brief explanation in your commit message, since this is a pretty subtle bug. And thanks for finding the fix!

Ken

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to