On Fri, 18 Jul 2025 17:42:47 +0900
Takashi Yano wrote:
> On Fri, 18 Jul 2025 09:54:20 +0200
> Corinna Vinschen wrote:
> > On Jul 18 01:28, Takashi Yano via Cygwin wrote:
> > > On Fri, 18 Jul 2025 00:44:46 +0900
> > > Takashi Yano wrote:
> > > > On Thu, 17 Jul 2025 17:19:49 +0200
> > > > Corinna Vinschen wrote:
> > > > > On Jul 17 23:14, Takashi Yano via Cygwin wrote:
> > > > > > Hi Corinna,
> > > > > > 
> > > > > > On Wed, 16 Jul 2025 17:36:42 +0200
> > > > > > Corinna Vinschen wrote:
> > > > > > > On Jul 16 23:52, Takashi Yano via Cygwin wrote:
> > > > > > > > Do you have any idea?
> > > > > > > 
> > > > > > > Locking would be super-simple.
> > > > > > > 
> > > > > > > But theoretically it should be possible to use a local 
> > > > > > > child_info_spawn
> > > > > > > variable at this point.  The ch_spawn child_info_spawn instance 
> > > > > > > is not
> > > > > > > copied to the child anyway, so that should be safe.  The same 
> > > > > > > goes for
> > > > > > > posix_spawn() then, btw.
> > > > > > > 
> > > > > > > I checked the sources and I don't see any dependency to ch_spawn
> > > > > > > from a spawning process, in contrast to an exec'ing process.  That
> > > > > > > doesn't mean there is none, just that I didn't find any.
> > > > > > 
> > > > > > Thanks!
> > > > > > As a starting point, I tried tntroducing locking. It almost works
> > > > > > as expected, however, sometimes my STC in my first report is hangs
> > > > > > if N is large e.g. 100. The patch is as attached.
> > > > > > 
> > > > > > What am I missing?
> > > > > 
> > > > > I don't know.  You're perhaps not releasing the lock in all cases.
> > > > > But I would have to debug this just like you ¯\_(ツ)_/¯
> > > > > 
> > > > > Out of curiosity, did you try using a locale child_info_spawn instance
> > > > > instead?  That would be a rather nice solution, but I'm pretty sure
> > > > > there's some other problem lurking in the dark...
> > > > 
> > > > I'm not sure what to do with local child_info_spawn.
> > > > Some of other modules refer to ch_spawn, such as exception.cc and
> > > > pinfo.cc. Also, has_execed* uses ch_spawn. What should we do for that?
> > > > 
> > > > I've just tried simply the following patch, however, this also hangs
> > > > with my STC.
> > > > 
> > > > diff --git a/winsup/cygwin/spawn.cc b/winsup/cygwin/spawn.cc
> > > > index cb58b6eed..56fca6e45 100644
> > > > --- a/winsup/cygwin/spawn.cc
> > > > +++ b/winsup/cygwin/spawn.cc
> > > > @@ -944,6 +944,7 @@ spawnve (int mode, const char *path, const char 
> > > > *const *argv,
> > > >    int ret;
> > > >  
> > > >    syscall_printf ("spawnve (%s, %s, %p)", path, argv[0], envp);
> > > > +  child_info_spawn ch_spawn_local;
> > > >  
> > > >    if (!envp)
> > > >      envp = empty_env;
> > > > @@ -951,7 +952,7 @@ spawnve (int mode, const char *path, const char 
> > > > *const *argv,
> > > >    switch (_P_MODE (mode))
> > > >      {
> > > >      case _P_OVERLAY:
> > > > -      ch_spawn.worker (path, argv, envp, mode);
> > > > +      ch_spawn_local.worker (path, argv, envp, mode);
> > > >        /* Errno should be set by worker.  */
> > > >        ret = -1;
> > > >        break;
> > > > @@ -961,7 +962,7 @@ spawnve (int mode, const char *path, const char 
> > > > *const *argv,
> > > >      case _P_WAIT:
> > > >      case _P_DETACH:
> > > >      case _P_SYSTEM:
> > > > -      ret = ch_spawn.worker (path, argv, envp, mode);
> > > > +      ret = ch_spawn_local.worker (path, argv, envp, mode);
> > > >        break;
> > > >      default:
> > > >        set_errno (EINVAL);
> > > 
> > > The hang seems to be at acquiring the cygheap_protect lock in child 
> > > sh.exe.
> > > This lock is aquired only in _cfree() and _cmalloc(), so I am not sure why
> > > cygheap_protect cannot be acquired at this point at all...
> > 
> > How do the affected backtraces look like?
> > 
> > Also, one reason could be that cygheap_protect is a SRWLOCK  since
> > 5d3e79ec6bb73 ("Cygwin: cygheap: use SRWLOCK for cygheap locking")
> > 
> > SRWLOCK is not recursive.  What if you revert this lock to a muto as
> > before 5d3e79ec6bb73?
> 
> Thanks! I'll try muto.

Tried. If muto is used instead of SRWLock, hang does not happen anymore.
However, sometimes "Bad address" happens.

$ ./a.exe
sh: line 1: /usr/bin/echo: Bad address
sh: line 1: /usr/bin/echo: Bad address
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
0
0
0
0
0
0
0
0
11
11
$ 

Heap might be broken?

-- 
Takashi Yano <[email protected]>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to