On Mon, 19 May 2025 at 12:00, Daniel P. Berrangé <[email protected]> wrote: > > On Thu, May 15, 2025 at 07:20:14PM -0300, Fabiano Rosas wrote: > > It's possible for the hup_source to have its reference decremented by > > remove_hup_source() while it's still being added to the context, > > leading to asserts in glib: > > IIUC this must mean that > > tcp_chr_free_connection > > is being called concurrently with > > update_ioc_handlers > > I'm wondering if that is really intended, or a sign of a deeper > bug that we'll just paper over if we add the mutex proposed here. > > Are you able to provide stack traces showing the 2 concurrent > operations that are triggering this problem ?
(Pulling up this thread from last year since I ran into something similar running the sanitizers, and Fabiano pointed me at this series he'd sent that fixes most/all of the problems.) I had a look at this, and although I couldn't get a specific backtrace, I got enough information from asan leak backtraces and looking at the code that I think I understand where the race is here: vhost-user-test.c:test_server_create_chr() sets up a chardev with: qemu_chr_new() qemu_chr_fe_init() qemu_chr_fe_set_handlers() For the "connect-fail" and "flags-mismatch" tests, it includes in the options to the chardev "reconnect-ms=1000". qemu_chr_new() ends up in tcp_chr_open(), which calls qmp_chardev_open_socket_client(). If reconnect_ms is not 0, qmp_chardev_open_socket_client() calls tcp_chr_connect_client_async() instead of doing the connect synchronously. That kicks off a qio_task_new() to do the connecting, and immediately returns, so we return successfully from qemu_chr_new(). So test_server_create_chr() will continue execution into qemu_chr_fe_init() and qemu_chr_fe_set_handlers(). qemu_chr_fe_set_handlers() calls qemu_chr_be_update_read_handlers(), which calls the chr_update_read_handler method for the ChardevClass, which here is tcp_chr_update_read_handler(). That function calls update_ioc_handlers(). Meanwhile, the thread that got kicked off to do the connect is running in parallel. If the connect succeeds then qemu_chr_socket_conneted() will call tcp_chr_connect(), which will also call update_ioc_handlers(). (If the connect fails then presumably we end up in the tcp_char_disconnect or otherwise destroying the chardev, though I haven't figured out exactly what happens in this case.) None of this appears to be taking any kind of lock while it modifies the SocketChardev fields, so we could have two threads in update_ioc_handlers() simultaneously. How is this intended to work ? Is the test case code incorrect to assume it can work with the chardev it got back from qemu_chr_new(), or should the chardev be handling this case? thanks -- PMM
