Re: bdrv_drained_begin deadlock with io-threads

2020-04-06 Thread Kevin Wolf
Am 03.04.2020 um 18:31 hat Dietmar Maurer geschrieben: > > > On April 3, 2020 10:47 AM Kevin Wolf wrote: > > > > > > Am 03.04.2020 um 10:26 hat Dietmar Maurer geschrieben: > > > > With the following patch, it seems to survive for now. I'll give it some > > > > more testing tomorrow (also qemu-

Re: bdrv_drained_begin deadlock with io-threads

2020-04-03 Thread Dietmar Maurer
> On April 3, 2020 10:47 AM Kevin Wolf wrote: > > > Am 03.04.2020 um 10:26 hat Dietmar Maurer geschrieben: > > > With the following patch, it seems to survive for now. I'll give it some > > > more testing tomorrow (also qemu-iotests to check that I didn't > > > accidentally break something el

Re: bdrv_drained_begin deadlock with io-threads

2020-04-03 Thread Kevin Wolf
Am 03.04.2020 um 10:26 hat Dietmar Maurer geschrieben: > > With the following patch, it seems to survive for now. I'll give it some > > more testing tomorrow (also qemu-iotests to check that I didn't > > accidentally break something else.) > > Wow, that was fast! Seems your patch fixes the bug! >

Re: bdrv_drained_begin deadlock with io-threads

2020-04-03 Thread Dietmar Maurer
> With the following patch, it seems to survive for now. I'll give it some > more testing tomorrow (also qemu-iotests to check that I didn't > accidentally break something else.) Wow, that was fast! Seems your patch fixes the bug! I wonder what commit introduced that problem, maybe: https://gith

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Thomas Lamprecht
On 4/2/20 7:10 PM, Kevin Wolf wrote: > Am 02.04.2020 um 18:47 hat Kevin Wolf geschrieben: >> So I think this is the bug: Calling blk_wait_while_drained() from >> anywhere between blk_inc_in_flight() and blk_dec_in_flight() is wrong >> because it will deadlock the drain operation. >> >> blk_aio_read

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Kevin Wolf
Am 02.04.2020 um 18:47 hat Kevin Wolf geschrieben: > Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben: > > > Can you reproduce the problem with my script, but pointing it to your > > > Debian image and running stress-ng instead of dd? > > > > yes > > > > > If so, how long does > > > it take

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Kevin Wolf
Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben: > > Can you reproduce the problem with my script, but pointing it to your > > Debian image and running stress-ng instead of dd? > > yes > > > If so, how long does > > it take to reproduce for you? > > I sometimes need up to 130 iterations .

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Dietmar Maurer
> It does looks more like your case because I now have bs.in_flight == 0 > and the BlockBackend of the scsi-hd device has in_flight == 8. yes, this looks very familiar. > Of course, this still doesn't answer why it happens, and I'm not sure if we > can tell without adding some debug code. > > I

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Dietmar Maurer
> Can you reproduce the problem with my script, but pointing it to your > Debian image and running stress-ng instead of dd? yes > If so, how long does > it take to reproduce for you? I sometimes need up to 130 iterations ... Worse, I thought several times the bug is gone, but then it triggered

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Kevin Wolf
Am 02.04.2020 um 14:14 hat Kevin Wolf geschrieben: > Am 02.04.2020 um 11:10 hat Dietmar Maurer geschrieben: > > > It seems to fix it, yes. Now I don't get any hangs any more. > > > > I just tested using your configuration, and a recent centos8 image > > running dd loop inside it: > > > > # while

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Kevin Wolf
Am 02.04.2020 um 11:10 hat Dietmar Maurer geschrieben: > > It seems to fix it, yes. Now I don't get any hangs any more. > > I just tested using your configuration, and a recent centos8 image > running dd loop inside it: > > # while dd if=/dev/urandom of=testfile.raw bs=1M count=100; do sync; don

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Dietmar Maurer
> > Do you also run "stress-ng -d 5" indied the VM? > > I'm not using the exact same test case, but something that I thought > would be similar enough. Specifically, I run the script below, which > boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero > of=/dev/sda' This test

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Dietmar Maurer
> It seems to fix it, yes. Now I don't get any hangs any more. I just tested using your configuration, and a recent centos8 image running dd loop inside it: # while dd if=/dev/urandom of=testfile.raw bs=1M count=100; do sync; done With that, I am unable to trigger the bug. Would you mind runni

Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Dietmar Maurer
> > But, IMHO the commit is not the reason for (my) bug - It just makes > > it easier to trigger... I can see (my) bug sometimes with 4.1.1, although > > I have no easy way to reproduce it reliable. > > > > Also, Stefan sent some patches to the list to fix some of the problems. > > > > https://li

Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Kevin Wolf
Am 01.04.2020 um 20:28 hat Dietmar Maurer geschrieben: > > That's a pretty big change, and I'm not sure how it's related to > > completed requests hanging in the thread pool instead of reentering the > > file-posix coroutine. But I also tested it enough that I'm confident > > it's really the first

Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Kevin Wolf
Am 01.04.2020 um 20:12 hat Kevin Wolf geschrieben: > Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben: > > > > I really nobody else able to reproduce this (somebody already tried to > > > > reproduce)? > > > > > > I can get hangs, but that's for job_completed(), not for starting the > > > jo

Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Dietmar Maurer
> That's a pretty big change, and I'm not sure how it's related to > completed requests hanging in the thread pool instead of reentering the > file-posix coroutine. But I also tested it enough that I'm confident > it's really the first bad commit. > > Maybe you want to try if your problem starts a

Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Kevin Wolf
Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben: > > > I really nobody else able to reproduce this (somebody already tried to > > > reproduce)? > > > > I can get hangs, but that's for job_completed(), not for starting the > > job. Also, my hangs have a non-empty bs->tracked_requests, so it

Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Dietmar Maurer
> > I really nobody else able to reproduce this (somebody already tried to > > reproduce)? > > I can get hangs, but that's for job_completed(), not for starting the > job. Also, my hangs have a non-empty bs->tracked_requests, so it looks > like a different case to me. Please can you post the com

Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Dietmar Maurer
> On April 1, 2020 5:37 PM Dietmar Maurer wrote: > > > > > I really nobody else able to reproduce this (somebody already tried to > > > reproduce)? > > > > I can get hangs, but that's for job_completed(), not for starting the > > job. Also, my hangs have a non-empty bs->tracked_requests, so

Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Kevin Wolf
Am 31.03.2020 um 18:18 hat Dietmar Maurer geschrieben: > > > Looks bdrv_parent_drained_poll_single() calls > > > blk_root_drained_poll(), which return true in my case (in_flight > 5). > > > > Can you identify which BlockBackend is this? Specifically if it's the > > one attached to a guest device o

Re: bdrv_drained_begin deadlock with io-threads

2020-03-31 Thread Dietmar Maurer
> On March 31, 2020 5:37 PM Kevin Wolf wrote: > > > Am 31.03.2020 um 17:24 hat Dietmar Maurer geschrieben: > > > > > > How can I see/debug those waiting request? > > > > > > Examine bs->tracked_requests list. > > > > > > BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of corou

Re: bdrv_drained_begin deadlock with io-threads

2020-03-31 Thread Kevin Wolf
Am 31.03.2020 um 17:24 hat Dietmar Maurer geschrieben: > > > > How can I see/debug those waiting request? > > > > Examine bs->tracked_requests list. > > > > BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of coroutine > > of this request. You may use qemu-gdb script to print reques

Re: bdrv_drained_begin deadlock with io-threads

2020-03-31 Thread Dietmar Maurer
> > How can I see/debug those waiting request? > > Examine bs->tracked_requests list. > > BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of coroutine of > this request. You may use qemu-gdb script to print request's coroutine > back-trace: I would, but there are no tracked requ

Re: bdrv_drained_begin deadlock with io-threads

2020-03-31 Thread Vladimir Sementsov-Ogievskiy
31.03.2020 17:32, Dietmar Maurer wrote: After a few iteration the VM freeze inside bdrv_drained_begin(): Thread 1 (Thread 0x7fffe9291080 (LWP 30949)): #0 0x75cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2, timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sy

Re: bdrv_drained_begin deadlock with io-threads

2020-03-31 Thread Dietmar Maurer
> > After a few iteration the VM freeze inside bdrv_drained_begin(): > > > > Thread 1 (Thread 0x7fffe9291080 (LWP 30949)): > > #0 0x75cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2, > > timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at > > ../sysdeps/unix/sysv/linux/ppoll.c:3

Re: bdrv_drained_begin deadlock with io-threads

2020-03-31 Thread Kevin Wolf
[ CC qemu-block ] Am 31.03.2020 um 10:46 hat Dietmar Maurer geschrieben: > I can see and reproduce this error with latest code from today. > But I also see it on stable 4.1.1 (sometimes). > > I guess this is a similar problem as reported earlier: > https://lists.gnu.org/archive/html/qemu-devel/2

Re: bdrv_drained_begin deadlock with io-threads

2020-03-31 Thread Dietmar Maurer
> Inside exec.c, there is a race: > > --- > static bool prepare_mmio_access(MemoryRegion *mr) > { > bool unlocked = !qemu_mutex_iothread_locked(); > bool release_lock = false; > > if (unlocked && mr->global_locking) { > qemu_mutex_lock_iothread(); > -- > > IMHO, check

Re: bdrv_drained_begin deadlock with io-threads

2020-03-31 Thread Dietmar Maurer
Inside exec.c, there is a race: --- static bool prepare_mmio_access(MemoryRegion *mr) { bool unlocked = !qemu_mutex_iothread_locked(); bool release_lock = false; if (unlocked && mr->global_locking) { qemu_mutex_lock_iothread(); -- IMHO, checking for unlocked that way

bdrv_drained_begin deadlock with io-threads

2020-03-31 Thread Dietmar Maurer
I can see and reproduce this error with latest code from today. But I also see it on stable 4.1.1 (sometimes). I guess this is a similar problem as reported earlier: https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07363.html To reproduce, you need a VM using virtio-scsi-single drive usi