Am 03.04.2020 um 18:31 hat Dietmar Maurer geschrieben:
>
> > On April 3, 2020 10:47 AM Kevin Wolf wrote:
> >
> >
> > Am 03.04.2020 um 10:26 hat Dietmar Maurer geschrieben:
> > > > With the following patch, it seems to survive for now. I'll give it some
> > > > more testing tomorrow (also qemu-
> On April 3, 2020 10:47 AM Kevin Wolf wrote:
>
>
> Am 03.04.2020 um 10:26 hat Dietmar Maurer geschrieben:
> > > With the following patch, it seems to survive for now. I'll give it some
> > > more testing tomorrow (also qemu-iotests to check that I didn't
> > > accidentally break something el
Am 03.04.2020 um 10:26 hat Dietmar Maurer geschrieben:
> > With the following patch, it seems to survive for now. I'll give it some
> > more testing tomorrow (also qemu-iotests to check that I didn't
> > accidentally break something else.)
>
> Wow, that was fast! Seems your patch fixes the bug!
>
> With the following patch, it seems to survive for now. I'll give it some
> more testing tomorrow (also qemu-iotests to check that I didn't
> accidentally break something else.)
Wow, that was fast! Seems your patch fixes the bug!
I wonder what commit introduced that problem, maybe:
https://gith
On 4/2/20 7:10 PM, Kevin Wolf wrote:
> Am 02.04.2020 um 18:47 hat Kevin Wolf geschrieben:
>> So I think this is the bug: Calling blk_wait_while_drained() from
>> anywhere between blk_inc_in_flight() and blk_dec_in_flight() is wrong
>> because it will deadlock the drain operation.
>>
>> blk_aio_read
Am 02.04.2020 um 18:47 hat Kevin Wolf geschrieben:
> Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben:
> > > Can you reproduce the problem with my script, but pointing it to your
> > > Debian image and running stress-ng instead of dd?
> >
> > yes
> >
> > > If so, how long does
> > > it take
Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben:
> > Can you reproduce the problem with my script, but pointing it to your
> > Debian image and running stress-ng instead of dd?
>
> yes
>
> > If so, how long does
> > it take to reproduce for you?
>
> I sometimes need up to 130 iterations .
> It does looks more like your case because I now have bs.in_flight == 0
> and the BlockBackend of the scsi-hd device has in_flight == 8.
yes, this looks very familiar.
> Of course, this still doesn't answer why it happens, and I'm not sure if we
> can tell without adding some debug code.
>
> I
> Can you reproduce the problem with my script, but pointing it to your
> Debian image and running stress-ng instead of dd?
yes
> If so, how long does
> it take to reproduce for you?
I sometimes need up to 130 iterations ...
Worse, I thought several times the bug is gone, but then it triggered
Am 02.04.2020 um 14:14 hat Kevin Wolf geschrieben:
> Am 02.04.2020 um 11:10 hat Dietmar Maurer geschrieben:
> > > It seems to fix it, yes. Now I don't get any hangs any more.
> >
> > I just tested using your configuration, and a recent centos8 image
> > running dd loop inside it:
> >
> > # while
Am 02.04.2020 um 11:10 hat Dietmar Maurer geschrieben:
> > It seems to fix it, yes. Now I don't get any hangs any more.
>
> I just tested using your configuration, and a recent centos8 image
> running dd loop inside it:
>
> # while dd if=/dev/urandom of=testfile.raw bs=1M count=100; do sync; don
> > Do you also run "stress-ng -d 5" indied the VM?
>
> I'm not using the exact same test case, but something that I thought
> would be similar enough. Specifically, I run the script below, which
> boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero
> of=/dev/sda'
This test
> It seems to fix it, yes. Now I don't get any hangs any more.
I just tested using your configuration, and a recent centos8 image
running dd loop inside it:
# while dd if=/dev/urandom of=testfile.raw bs=1M count=100; do sync; done
With that, I am unable to trigger the bug.
Would you mind runni
> > But, IMHO the commit is not the reason for (my) bug - It just makes
> > it easier to trigger... I can see (my) bug sometimes with 4.1.1, although
> > I have no easy way to reproduce it reliable.
> >
> > Also, Stefan sent some patches to the list to fix some of the problems.
> >
> > https://li
Am 01.04.2020 um 20:28 hat Dietmar Maurer geschrieben:
> > That's a pretty big change, and I'm not sure how it's related to
> > completed requests hanging in the thread pool instead of reentering the
> > file-posix coroutine. But I also tested it enough that I'm confident
> > it's really the first
Am 01.04.2020 um 20:12 hat Kevin Wolf geschrieben:
> Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben:
> > > > I really nobody else able to reproduce this (somebody already tried to
> > > > reproduce)?
> > >
> > > I can get hangs, but that's for job_completed(), not for starting the
> > > jo
> That's a pretty big change, and I'm not sure how it's related to
> completed requests hanging in the thread pool instead of reentering the
> file-posix coroutine. But I also tested it enough that I'm confident
> it's really the first bad commit.
>
> Maybe you want to try if your problem starts a
Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben:
> > > I really nobody else able to reproduce this (somebody already tried to
> > > reproduce)?
> >
> > I can get hangs, but that's for job_completed(), not for starting the
> > job. Also, my hangs have a non-empty bs->tracked_requests, so it
> > I really nobody else able to reproduce this (somebody already tried to
> > reproduce)?
>
> I can get hangs, but that's for job_completed(), not for starting the
> job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> like a different case to me.
Please can you post the com
> On April 1, 2020 5:37 PM Dietmar Maurer wrote:
>
>
> > > I really nobody else able to reproduce this (somebody already tried to
> > > reproduce)?
> >
> > I can get hangs, but that's for job_completed(), not for starting the
> > job. Also, my hangs have a non-empty bs->tracked_requests, so
Am 31.03.2020 um 18:18 hat Dietmar Maurer geschrieben:
> > > Looks bdrv_parent_drained_poll_single() calls
> > > blk_root_drained_poll(), which return true in my case (in_flight > 5).
> >
> > Can you identify which BlockBackend is this? Specifically if it's the
> > one attached to a guest device o
> On March 31, 2020 5:37 PM Kevin Wolf wrote:
>
>
> Am 31.03.2020 um 17:24 hat Dietmar Maurer geschrieben:
> >
> > > > How can I see/debug those waiting request?
> > >
> > > Examine bs->tracked_requests list.
> > >
> > > BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of corou
Am 31.03.2020 um 17:24 hat Dietmar Maurer geschrieben:
>
> > > How can I see/debug those waiting request?
> >
> > Examine bs->tracked_requests list.
> >
> > BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of coroutine
> > of this request. You may use qemu-gdb script to print reques
> > How can I see/debug those waiting request?
>
> Examine bs->tracked_requests list.
>
> BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of coroutine of
> this request. You may use qemu-gdb script to print request's coroutine
> back-trace:
I would, but there are no tracked requ
31.03.2020 17:32, Dietmar Maurer wrote:
After a few iteration the VM freeze inside bdrv_drained_begin():
Thread 1 (Thread 0x7fffe9291080 (LWP 30949)):
#0 0x75cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2,
timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at
../sysdeps/unix/sy
> > After a few iteration the VM freeze inside bdrv_drained_begin():
> >
> > Thread 1 (Thread 0x7fffe9291080 (LWP 30949)):
> > #0 0x75cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2,
> > timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at
> > ../sysdeps/unix/sysv/linux/ppoll.c:3
[ CC qemu-block ]
Am 31.03.2020 um 10:46 hat Dietmar Maurer geschrieben:
> I can see and reproduce this error with latest code from today.
> But I also see it on stable 4.1.1 (sometimes).
>
> I guess this is a similar problem as reported earlier:
> https://lists.gnu.org/archive/html/qemu-devel/2
> Inside exec.c, there is a race:
>
> ---
> static bool prepare_mmio_access(MemoryRegion *mr)
> {
> bool unlocked = !qemu_mutex_iothread_locked();
> bool release_lock = false;
>
> if (unlocked && mr->global_locking) {
> qemu_mutex_lock_iothread();
> --
>
> IMHO, check
Inside exec.c, there is a race:
---
static bool prepare_mmio_access(MemoryRegion *mr)
{
bool unlocked = !qemu_mutex_iothread_locked();
bool release_lock = false;
if (unlocked && mr->global_locking) {
qemu_mutex_lock_iothread();
--
IMHO, checking for unlocked that way
I can see and reproduce this error with latest code from today.
But I also see it on stable 4.1.1 (sometimes).
I guess this is a similar problem as reported earlier:
https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07363.html
To reproduce, you need a VM using virtio-scsi-single drive usi
30 matches
Mail list logo