On Fri, 20 Sep 2002 11:56:57 -0400, Christopher Faylor <[EMAIL PROTECTED]>
wrote:

>On Fri, Sep 20, 2002 at 11:26:42AM +0000, Guy Harrison wrote:
>>On Wed, 18 Sep 2002 15:35:53 -0400, Christopher Faylor <[EMAIL PROTECTED]>
>>wrote:

Shame us non-developers can't get it "readonly".

http://cygwin.com/ml/cygwin-developers/2002-09/msg00071.html

...sounds *exactly* like my problem. Moreover the build date on my last
cygwin1.dll that works is 2002-07-11. Similar timeframe.


>>>On Wed, Sep 18, 2002 at 06:42:50PM +0000, Guy Harrison wrote:
>>>>On Fri, 13 Sep 2002 08:58:16 -0400, Christopher Faylor <[EMAIL PROTECTED]>
>>>>wrote:
>>>>
>>>>>On Fri, Sep 13, 2002 at 09:09:37AM +0000, Guy Harrison wrote:
>>>>>>I can't seem to figure out how to set a breakpoint in sigproc.cc without
>>>>>>recompiling make with debug. Any hints?
>>>>>
>>>>>Just attach to the running process and set a breakpoint.
>>>>>
>>>>>Alternatively, use the "dll" command to load cygwin1.dll and then set
>>>>>a breakpoint on a *line number*.
>>>>
>>>>Thanks, the latter helped verify that debugging made the problem go away
>>>>- ditto strace. Initially I thought it was a race. Racing certainly
>>>>helps trigger it but that isn't the problem.
>>>>
>>>>I can't see a mechanism involving cygthread::stub to cater for the case
>>>>where "last man out"+1 ensures "last man out" is running. In all
>>>>situations where abnormal behaviour occurs we're left waiting upon a
>>>>process that consists of a single suspended cygthread::stub thread.
>>>>Others should be able to verify this by bumping up the size of the
>>>>cygthread.cc threads[] array up to a silly value then attempt an
>>>>intensive configure/make/install with it. Conversely now that I've set
>>>>threads[1] there's been no breakages.
>>>
>>>Where are you seeing this wait?  Details please.
>>
>>Any reasonably intensive configure/make/install build. Not surprising
>>'cos that's what I do most. Name almost any process that occurs during
>>that and its had a hang on a lone suspended thread - all the parent
>>processes waiting on it. Spurious.
>
>The "where" meant where in the code.
>
>You apparently tracked things down to the cygthread code but I don't
>see any real analysis of why the cygthread code would cause this.  The
>fact that you twiddled something and the problem went away does not
>necessarily mean that you've found the source of the problem -- not
>in any complex system at least.

I stared at it until I went boss-eyed *then* I twiddled it.

>I suspect that this is actualy due to a deadlock in the code init.cc
>which was recently discussed in cygwin-developers.
>
>>The implication is that cygthread::stub's should be in suspended state
>>as the process exits. Is this (a) correct, (b) expected, (c) required?
>
>a, b.
>
>>Anyone know, or heard of, issues reguarding suspended threads and
>>::ExitProcess()?
>
>Deadlocks with thread or process attach/detach code are documented in
>MSDN.
>
>>Possibles that come to mind - winAPI bug whereby a suspended thread can
>>be momentarily woken (ie enough to become the main thread), or perhaps a
>>suspended thread can linger due to a handle being left open on it and
>>therby become the main thread.
>
>I don't think it has anything to do with suspended threads.  You can
>certainly verify this by adding code to kill the threads specifically,
>though, and see what happens.

I did. I declared threads[1]. All the work gets shoved onto
cygthread::simplestub which neither suspends nor stays resident.

>The deadlock would be more likely if there are more threads and with the
>new cygthread code there will always be at least six extra threads.

Thanks for confirming (a) & (b). I put some checks into _pinfo::exit()
immediately prior to ::ExitProcess(). The info didn't mean much without
that.

Hung process:

Name---------Pid-Pri-Thd--Hnd----Mem-----User-Time---Kernel-Time---Elapsed-Time
sh-----------344---4---1---67---1832---0:00:00.020---0:00:00.080----0:02:29.935
----------------------VM------WS---WS-Pk----Priv---Faults-NonP-Page-PageFile
------------------351732----1832----1964----1476------492----3---21-----1476
-Tid-Pri----Cswtch------------State-----User-Time---Kernel-Time---Elapsed-Time
-548---4---------1---Wait:Suspended---0:00:00.000---0:00:00.000----0:02:29.825

Relevent log:

Quick Key:
<GetCurrentProcessId/GetCurrentThreadId> 90 GetCommandLine() chars
[n/32] =threads[n] of NTHREADS=32
mti    =main_thread_id
nam    =ignore fixed on "mti" here
sdc    =SD_count (member added to cygthread class) suspend count
av     =threads[].avail
id     =threads[].id
h      =threads[].h
sus    =another suspend count
gle    =GetLastError() for failed "sus"

<344/509> cli(90):J:\cygwin\bin\sh.exe
pid=344 tid=509[0/32]{mti:509}: nam=[main] sdc=-999 av=877 id=0 h=296
sus=2 gle=0 
pid=344 tid=509[1/32]{mti:509}: nam=[main] sdc=-999 av=212 id=0 h=300
sus=2 gle=0 
pid=344 tid=509[2/32]{mti:509}: nam=[main] sdc=-999 av=894 id=0 h=304
sus=2 gle=0 
pid=344 tid=509[3/32]{mti:509}: nam=[main] sdc=-999 av=482 id=0 h=308
sus=2 gle=0 
pid=344 tid=509[4/32]{mti:509}: nam=[main] sdc=-999 av=606 id=0 h=312
sus=2 gle=0 
pid=344 tid=509[5/32]{mti:509}: nam=[main] sdc=-999 av=664 id=0 h=316
sus=2 gle=0 
pid=344 tid=509[6/32]{mti:509}: nam=[main] sdc=-999 av=673 id=0 h=324
sus=2 gle=0 
pid=344 tid=509[7/32]{mti:509}: nam=[main] sdc=-999 av=317 id=0 h=328
sus=2 gle=0 
pid=344 tid=509[8/32]{mti:509}: nam=[main] sdc=-999 av=303 id=0 h=332
sus=2 gle=0 
pid=344 tid=509[9/32]{mti:509}: nam=[main] sdc=-999 av=723 id=0 h=336
sus=2 gle=0 
pid=344 tid=509[10/32]{mti:509}: nam=[main] sdc=-999 av=337 id=0 h=340
sus=2 gle=0 
pid=344 tid=509[11/32]{mti:509}: nam=[main] sdc=-999 av=472 id=0 h=344
sus=2 gle=0 
pid=344 tid=509[12/32]{mti:509}: nam=[main] sdc=-999 av=627 id=0 h=348
sus=2 gle=0 
pid=344 tid=509[13/32]{mti:509}: nam=[main] sdc=-999 av=458 id=0 h=352
sus=2 gle=0 
pid=344 tid=509[14/32]{mti:509}: nam=[main] sdc=-999 av=875 id=0 h=356
sus=2 gle=0 
pid=344 tid=509[15/32]{mti:509}: nam=[main] sdc=-999 av=637 id=0 h=360
sus=2 gle=0 
pid=344 tid=509[16/32]{mti:509}: nam=[main] sdc=-999 av=768 id=0 h=364
sus=2 gle=0 
pid=344 tid=509[17/32]{mti:509}: nam=[main] sdc=-999 av=168 id=0 h=368
sus=2 gle=0 
pid=344 tid=509[18/32]{mti:509}: nam=[main] sdc=-999 av=216 id=0 h=372
sus=2 gle=0 
pid=344 tid=509[19/32]{mti:509}: nam=[main] sdc=-999 av=783 id=0 h=376
sus=2 gle=0 
pid=344 tid=509[20/32]{mti:509}: nam=[main] sdc=-999 av=226 id=0 h=380
sus=2 gle=0 
pid=344 tid=509[21/32]{mti:509}: nam=[main] sdc=-999 av=355 id=0 h=384
sus=2 gle=0 
pid=344 tid=509[22/32]{mti:509}: nam=[main] sdc=-999 av=651 id=0 h=388
sus=2 gle=0 
pid=344 tid=509[23/32]{mti:509}: nam=[main] sdc=-999 av=717 id=0 h=392
sus=2 gle=0 
pid=344 tid=509[24/32]{mti:509}: nam=[main] sdc=-999 av=859 id=0 h=396
sus=2 gle=0 
pid=344 tid=509[25/32]{mti:509}: nam=[main] sdc=-999 av=752 id=0 h=400
sus=2 gle=0 
pid=344 tid=509[26/32]{mti:509}: nam=[main] sdc=-999 av=796 id=0 h=404
sus=2 gle=0 
pid=344 tid=509[27/32]{mti:509}: nam=[main] sdc=-999 av=887 id=0 h=408
sus=2 gle=0 
pid=344 tid=509[28/32]{mti:509}: nam=[main] sdc=-999 av=728 id=0 h=412
sus=2 gle=0 
pid=344 tid=509[29/32]{mti:509}: nam=[main] sdc=-99 av=0 id=908 h=416
sus=1 gle=0 
pid=344 tid=509[30/32]{mti:509}: nam=[main] sdc=0 av=0 id=0 h=0 sus=-1
gle=6 
pid=344 tid=509[31/32]{mti:509}: nam=[main] sdc=0 av=0 id=0 h=0 sus=-1
gle=6 

The ::SuspendThread() and ::ResumeThread() calls in cygthread.cc assign
their result directly to SD_count. I set it explicity to silly negative
values at these points:

-999 in cygthread::runner() after their ::CreateThread()
-99 in cygthread::stub just prior to init_exceptions()
-2 cygthread::exit_thread ::SetEvent()
-9999 cygthread::stub ::ExitThread()

Nothing else touches 'SD_count'. The above output is generated by a
function 'SD_DumpLiving()' inserted immediately prior to ::ExitProcess()
within _pinfo::exit().

Our hung process is definately suspended. I got this one woken back up
and the build went to completion. tid=548 is nowhere to be seen so it
stands to reason it formally resided in threads[30] or threads[31].
Nowhere do I set SD_count=0. Must be cygthread::stub SuspendThread or
external influence.


-- 
[EMAIL PROTECTED]

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply via email to