On Fri, Mar 06, 2026 at 01:57:45PM +0000, Peter Maydell wrote:
> On Wed, 4 Mar 2026 at 10:34, Daniel P. Berrangé <[email protected]> wrote:
> > On Sat, Feb 28, 2026 at 03:16:11PM +0000, Peter Maydell wrote:
> > > I had a look at this, and although I couldn't get a specific
> > > backtrace, I got enough information from asan leak backtraces
> > > and looking at the code that I think I understand where the race
> > > is here:
> > >
> > > vhost-user-test.c:test_server_create_chr() sets up a chardev with:
> > >  qemu_chr_new()
> > >  qemu_chr_fe_init()
> > >  qemu_chr_fe_set_handlers()
> > >
> > > For the "connect-fail" and "flags-mismatch" tests, it includes in
> > > the options to the chardev "reconnect-ms=1000".
> > >
> > > qemu_chr_new() ends up in tcp_chr_open(), which calls
> > > qmp_chardev_open_socket_client().
> > >
> > > If reconnect_ms is not 0, qmp_chardev_open_socket_client() calls
> > > tcp_chr_connect_client_async() instead of doing the connect synchronously.
> > > That kicks off a qio_task_new() to do the connecting, and immediately
> > > returns, so we return successfully from qemu_chr_new(). So
> > > test_server_create_chr() will continue execution into qemu_chr_fe_init()
> > > and qemu_chr_fe_set_handlers().
> > >
> > > qemu_chr_fe_set_handlers() calls qemu_chr_be_update_read_handlers(), which
> > > calls the chr_update_read_handler method for the ChardevClass, which here
> > > is tcp_chr_update_read_handler(). That function calls 
> > > update_ioc_handlers().
> > >
> > > Meanwhile, the thread that got kicked off to do the connect is
> > > running in parallel. If the connect succeeds then 
> > > qemu_chr_socket_conneted()
> > > will call tcp_chr_connect(), which will also call
> > > update_ioc_handlers(). (If the connect fails then presumably we end
> > > up in the tcp_char_disconnect or otherwise destroying the chardev,
> > > though I haven't figured out exactly what happens in this case.)
> > >
> > > None of this appears to be taking any kind of lock while it modifies
> > > the SocketChardev fields, so we could have two threads in
> > > update_ioc_handlers() simultaneously.
> >
> > It is subtle, as the QIOTask object deals with 2 separate callbacks.
> >
> > The callback that is provided to qio_task_new() is what is invoked when
> > the task is marked as completed. This is guaranteed to always be
> > invoked in main event loop context.
> >
> > The callback that is provided to qio_task_run_in_thread() is what runs
> > in the throwaway background thread.  When this callback returns, the
> > QIOTask creates a glib idle source in the main event loop to run the
> > completion callback.
> >
> >
> > So wrt char-socket.c, tcp_chr_connect_client_task will run in the
> > background thread, but it only calls QIOChannel APIs so is safe.
> >
> > The qemu_chr_socket_connected method will run in main event loop
> > context, so it does not need locking wrt other chardev I/O
> > callbacks.
> 
> This is starting to sound like the problem is the test code. The
> test code is calling qemu_chr_new() and qemu_chr_fe_set_handlers()
> from a random thread it created by directly calling g_thread_new().
> So these calls from this thread do not have any kind of lock and
> can run in parallel with the qemu_chr_socket_connected method from
> the main event loop.
> 
> I guess we need to take a lock here, but which one?
> 
> > >  How is this intended to work ? Is the test case code incorrect to assume
> > > it can work with the chardev it got back from qemu_chr_new(), or should
> > > the chardev be handling this case?
> >
> > The test is really rather had to understand, but I feel like something
> > in the test ought to be looking at the CHR_EVENT_OPENED event to
> > identify when the connection is actually established ?
> 
> The way to get a CHR_EVENT_OPENED would be to set the frontend
> handlers with qemu_chr_fe_set_handlers(). It's the call to set
> the frontend handlers that is already going wrong. (But the test
> doesn't really need to care about the EVENT_OPENED: it can
> just wait til it gets the "you have some data to read" callback.)

I'm rather feeling that the test program would be better off not
using chardevs at all, and instead using QIOChannelSocket directly.
The chardevs are horribly complex when trying to deal with
connection oriented semantics, as they are really not a connection
based API; the event stuff is a hacky afterthought that barely
gets us by.

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|


Reply via email to