On 2011-03-09 09:50, Corentin Chary wrote: > On Wed, Mar 9, 2011 at 7:37 AM, Jan Kiszka <[email protected]> wrote: >> On 2011-03-08 23:53, Peter Lieven wrote: >>> Hi, >>> >>> during testing of qemu-kvm-0.14.0 i can reproduce the following segfault. i >>> have seen similar crash already in 0.13.0, but had no time to debug. >>> my guess is that this segfault is related to the threaded vnc server which >>> was introduced in qemu 0.13.0. the bug is only triggerable if a vnc >>> client is attached. it might also be connected to a resolution change in >>> the guest. i have a backtrace attached. the debugger is still running if >>> someone >>> needs more output >>> >> >> ... >> >>> Thread 1 (Thread 0x7ffff7ff0700 (LWP 29038)): >>> #0 0x0000000000000000 in ?? () >>> No symbol table info available. >>> #1 0x000000000041d669 in main_loop_wait (nonblocking=0) >>> at /usr/src/qemu-kvm-0.14.0/vl.c:1388 >> >> So we are calling a IOHandlerRecord::fd_write handler that is NULL. >> Looking at qemu_set_fd_handler2, this may happen if that function is >> called for an existing io-handler entry with non-NULL write handler, >> passing a NULL write and a non-NULL read handler. And all this without >> the global mutex held. >> >> And there are actually calls in vnc_client_write_plain and >> vnc_client_write_locked (in contrast to vnc_write) that may generate >> this pattern. It's probably worth validating that the iothread lock is >> always held when qemu_set_fd_handler2 is invoked to confirm this race >> theory, adding something like >> >> assert(pthread_mutex_trylock(&qemu_mutex) != 0); >> (that's for qemu-kvm only) >> >> BTW, qemu with just --enable-vnc-thread, ie. without io-thread support, >> should always run into this race as it then definitely lacks a global mutex. > > I'm not sure what mutex should be locked here (qemu_global_mutex, > qemu_fair_mutex, lock_iothread).
In upstream qemu, the latter - if it exists (which is not the case in
non-io-thread mode).
In qemu-kvm, those locks are yet unused. Rather the code in qemu-kvm.c
implements the global lock.
> But here is where is should be locked (other vnc_write calls in this
> thread should never trigger qemu_set_fd_handler):
>
> diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c
> index 1d4c5e7..e02d891 100644
> --- a/ui/vnc-jobs-async.c
> +++ b/ui/vnc-jobs-async.c
> @@ -258,7 +258,9 @@ static int vnc_worker_thread_loop(VncJobQueue *queue)
> goto disconnected;
> }
>
> + /* lock */
> vnc_write(job->vs, vs.output.buffer, vs.output.offset);
> + /* unlock */
>
> disconnected:
> /* Copy persistent encoding data */
> @@ -267,7 +269,9 @@ disconnected:
> vnc_unlock_output(job->vs);
>
> if (flush) {
> + /* lock */
> vnc_flush(job->vs);
> + /* unlock */
> }
Particularly this call is critical as it will trigger
vnc_client_write_locked which may NULL'ify the write handler.
But I'm also not sure about vnc_send_framebuffer_update. Someone has to
go through the threaded vnc code again, very thoroughly. It looks fragile.
Jan
signature.asc
Description: OpenPGP digital signature
