On Wed, 20 Nov 2024 22:43:08 +0900
Takashi Yano wrote:
> On Tue, 19 Nov 2024 18:21:52 +0900
> Takashi Yano wrote:
> > On Tue, 12 Nov 2024 10:53:58 +0100
> > Christian Franke wrote:
> > > Found with 'stress-ng --cpu-sched' from current stress-ng upstream HEAD:
> > > 
> > > Testcase (attached):
> > > 
> > > $ gcc -O2 -o manysignals manysignals.c
> > > 
> > > $ ./manysignals
> > > fork() = 1833
> > > ...
> > > fork() = 1848
> > > ...
> > > kill(1833, 17)
> > > ...
> > > kill(1848, 17)
> > > kill(1833, 9)
> > > ...
> > > kill(1848, 9)
> > > waitpid(1833, ., 0)
> > > 
> > > 
> > > Run this in second terminal:
> > > 
> > > $ watch "ps | sed -n '1p;/manysignals/{/sed/d;p}'"
> > > 
> > > If 'S' appear in the first column, the child processes likely reached 
> > > the final SIGSTOP state. This takes some time. The parent process may 
> > > still hang in first waitpid() but should not.
> > > 
> > > If the parent process is aborted with ^C, child processes may be stopped 
> > > or left behind. Occasionally a child process that can not be stopped by 
> > > Cygwin (kill -9) is left behind.
> > > 
> > > Tested with ancient (i7-2600K) and more recent (i7-14700K) CPU :-)
> > > 
> > > 
> > > Unrelated to the above, but related to 'stress-ng --cpu-sched' which 
> > > uses sched_get/setscheduler():
> > > 
> > > - sched_getscheduler() always returns SCHED_FIFO. As far as I understand 
> > > Linux sched(7), this is a non-preemptive real-time policy. The 
> > > preemptive SCHED_RR would possibly a more reasonable value. 
> > > Unfortunately SCHED_OTHER cannot be used because it would require to 
> > > ignore the priority.
> > > 
> > > - sched_setscheduler() always fails with ENOSYS. It IMO should allow to 
> > > set 'param->sched_priority' if 'policy' is equal to the value returned 
> > > by sched_getscheduler().
> > 
> > Thanks for the report and the test case. I'm now looking into
> > the issue. Please wait a while.
> 
> Hopefully, I have found the cause.
> 
> The deadlock happens between main thread and wait_sig thread.
> The main thread is waiting for the wait_sig thread triggering
> wakeup event while the wait_sig thread is waiting previous
> signal being processed by main thread.
> 
> Let me consider how to fix that.

I'd like to report my progress for this issue.

The patch attached almost solves the problem. However, your test
case is paused for tens of seconds, then ends normally.

If the code:
      cpu_set_t cpus; CPU_ZERO(&cpus);
      CPU_SET(0, &cpus);
      if (sched_setaffinity(getpid(), sizeof(cpus), &cpus))
        perror("setaffinity");

      for (;;)
        sched_yield();
is changed to just:
      for (;;) sleep(1);
the test case runs without pause.

I think there still is a bug in the signal handling.

-- 
Takashi Yano <takashi.y...@nifty.ne.jp>

Attachment: signal202411232024.patch
Description: Binary data

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to