On 03/09/2016 01:55 PM, Paolo Bonzini wrote:
>
>
> On 09/03/2016 13:21, Christian Borntraeger wrote:
>> I have some random crashes at startup
>>
>> Stack trace of thread 48326:
>> #0 0x000002aa2e0cce46 bdrv_co_do_rw (qemu-system-s390x)
>> #1 0x000002aa2e159e8e coroutine_trampoline
>> (qemu-system-s390x)
>> #2 0x000003ffbc35150a __makecontext_ret (libc.so.6)
>>
>>
>> that I was able to bisect.
>> commit 2906cddfecff21af20eedab43288b485a679f9ac does crash regularly,
>> 2906cddfecff21af20eedab43288b485a679f9ac^ does not.
>>
>> I will try to find somebody that looks into that - unless you have an idea.
>
> The only random idea is to move
>
> vblk->dataplane_started = true
>
> to the beginning of virtio_blk_data_plane_start rather than the end.
>
> Paolo
FWIW, it seems that this patch triggers this error, the "tracked_request_begin"
that I reported yesterday and / or some early read issues from the bootloader
in a random fashion.
Using 2906cddfecff21af20eedab43288b485a679f9ac^ seems to work all the time,
moving around vblk->dataplane_started = true also triggers all 3 types
of bugs, e.g.
Thread 1 (Thread 0x3ffaabff910 (LWP 32782)):
#0 0x0000000010329a70 in bdrv_co_do_rw (opaque=0x0) at
/home/cborntra/REPOS/qemu/block/io.c:2170
#1 0x00000000103b2e7a in coroutine_trampoline (i0=1023, i1=-2147470992) at
/home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79
#2 0x000003ffac85150a in __makecontext_ret () from /lib64/libc.so.6
(gdb) list
2165
2166 /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */
2167 static void coroutine_fn bdrv_co_do_rw(void *opaque)
2168 {
2169 BlockAIOCBCoroutine *acb = opaque;
2170 BlockDriverState *bs = acb->common.bs;
2171
2172 if (!acb->is_write) {
2173 acb->req.error = bdrv_co_do_readv(bs, acb->req.sector,
2174 acb->req.nb_sectors, acb->req.qiov, acb->req.flags);
I will try to find somebody to work on this.