> On Jul 9, 2023, at 19:59, Konstantin Belousov <kostik...@gmail.com> wrote:
>
> On Sun, Jul 09, 2023 at 11:36:03PM +0000, John F Carr wrote:
>>
>>
>>> On Jul 9, 2023, at 19:25, Konstantin Belousov <kostik...@gmail.com> wrote:
>>>
>>> On Sun, Jul 09, 2023 at 10:41:27PM +0000, John F Carr wrote:
>>>> Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some
>>>> irrelevant local changes, four 64 bit ARM processors, make.conf sets
>>>> CPUTYPE?=cortex-a57.
>>>>
>>>> I typed ^C while /bin/sh was starting a pipeline and my shell got hung in
>>>> the middle of fork().
>>>>
>>>>> From the terminal:
>>>>
>>>> # git log --oneline --|more
>>>> ^C^C^C
>>>> load: 3.26 cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k
>>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264
>>>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44
>>>> load: 3.16 cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k
>>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264
>>>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44
>>>>
>>>> According to ps -d on another terminal the shell has no children:
>>>>
>>>> PID TT STAT TIME COMMAND
>>>> [...]
>>>> 873 u0 IWs 0:00.00 `-- login [pam] (login)
>>>> 874 u0 I 0:00.17 `-- -sh (sh)
>>>> 95504 u0 I 0:00.01 `-- su -
>>>> 95505 u0 D+ 0:00.05 `-- -su (sh)
>>>> [...]
>>>>
>>>> Nothing on the (115200 bps serial) console. No change in system
>>>> performance.
>>>>
>>>> The system is busy copying a large amount of data from the network to a
>>>> ZFS pool on spinning disks. The git|more pipeline could have taken some
>>>> time to get going while I/O requests worked their way through the queue.
>>>> It would not have touched the busy pool, only the zroot pool on an SSD.
>>>>
>>>> Has anything changed recently that might cause this?
>>>
>>> There was some change around fork, but your sleep seems to be not from
>>> that change. Can you show the wait channel for the process? Do something
>>> like
>>> $ ps alxww
>>>
>>
>> UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
>> 0 95505 95504 2 20 0 13508 2876 fork D+ u0 0:00.13 -su (sh)
>>
>> This is probably the same information displayed as [fork] in the output from
>> ^T.
>>
>> Does it correspond to the source line
>>
>> pause("fork", hz / 2);
>>
>> ?
>
> Yes, it is rate-limiting code. Still it is interesting to see the whole
> ps output.
>
> Do you have 7a70f17ac4bd64dc1a5020f in your source?
No, I do not have that commit.
The comment mentions livelock. CPU use as reported by iostat did not change
after the process hung.