On 4/28/25 9:46 PM, Eric Blake wrote:
> From: "Richard W.M. Jones" <[email protected]>
>
> Enable NBD multi-conn by spreading operations across multiple
> connections.
>
> (XXX) This uses a naive round-robin approach which could be improved.
> For example we could look at how many requests are in flight and
> assign operations to the connections with fewest. Or we could try to
> estimate (based on size of requests outstanding) the load on each
> connection. But this implementation doesn't do any of that.
>
> Signed-off-by: Richard W.M. Jones <[email protected]>
> Message-ID: <[email protected]>
> ---
> block/nbd.c | 67 +++++++++++++++++++++++++++++++++++++++--------------
> 1 file changed, 49 insertions(+), 18 deletions(-)
>
> diff --git a/block/nbd.c b/block/nbd.c
> index 19da1a7a1fe..bf5bc57569c 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
>
> [...]
> @@ -2207,24 +2233,29 @@ static const char *const nbd_strong_runtime_opts[] = {
> static void nbd_cancel_in_flight(BlockDriverState *bs)
> {
> BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
> - NBDConnState * const cs = s->conns[0];
> + size_t i;
> + NBDConnState *cs;
>
> - reconnect_delay_timer_del(cs);
> + for (i = 0; i < MAX_MULTI_CONN; ++i) {
> + cs = s->conns[i];
>
> - qemu_mutex_lock(&cs->requests_lock);
> - if (cs->state == NBD_CLIENT_CONNECTING_WAIT) {
> - cs->state = NBD_CLIENT_CONNECTING_NOWAIT;
> + reconnect_delay_timer_del(cs);
> +
This code is causing iotests/{185,264,281} to segfault. E.g.:
> (gdb) bt
> #0 0x000055bbaec58119 in reconnect_delay_timer_del (cs=0x0) at
> ../block/nbd.c:205
> #1 0x000055bbaec5d8e4 in nbd_cancel_in_flight (bs=0x55bbb1458020) at
> ../block/nbd.c:2242
> #2 0x000055bbaec4ff16 in bdrv_cancel_in_flight (bs=0x55bbb1458020) at
> ../block/io.c:3737
> #3 0x000055bbaec54ec1 in mirror_cancel (job=0x55bbb21ce800, force=true) at
> ../block/mirror.c:1335
> #4 0x000055bbaec18278 in job_cancel_async_locked (job=0x55bbb21ce800,
> force=true) at ../job.c:893
> #5 0x000055bbaec18df2 in job_cancel_locked (job=0x55bbb21ce800, force=true)
> at ../job.c:1143
> #6 0x000055bbaec18ef3 in job_force_cancel_err_locked (job=0x55bbb21ce800,
> errp=0x7fff44f247a0) at ../job.c:1190
> #7 0x000055bbaec192a4 in job_finish_sync_locked (job=0x55bbb21ce800,
> finish=0x55bbaec18ed2 <job_force_cancel_err_locked>, errp=0x0) at
> ../job.c:1253
> #8 0x000055bbaec18f2e in job_cancel_sync_locked (job=0x55bbb21ce800,
> force=true) at ../job.c:1196
> #9 0x000055bbaec19086 in job_cancel_sync_all () at ../job.c:1214
> #10 0x000055bbaed55177 in qemu_cleanup (status=0) at ../system/runstate.c:949
> #11 0x000055bbaedd0aad in qemu_default_main (opaque=0x0) at
> ../system/main.c:51
> #12 0x000055bbaedd0b4f in main (argc=21, argv=0x7fff44f249d8) at
> ../system/main.c:80
We're dereferencing a pointer to NBDConnState that was never
initialized. So it should either be
> - for (i = 0; i < MAX_MULTI_CONN; ++i) {
> + for (i = 0; i < s->multi_conn; ++i) {
or
> - cs = s->conns[i];
> + if (s->conns[i]) {
> + cs = s->conns[i];
> + ...
[...]
Andrey