On Tue, Jun 16, 2026 at 06:58:40PM +0300, Andrey Drobyshev wrote:
On 6/16/26 5:18 PM, Stefano Garzarella wrote:
On Fri, Jun 12, 2026 at 07:57:17PM +0300, Andrey Drobyshev wrote:
[...]
static u32 vhost_transport_get_local_cid(void)
@@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net
*net)
* the mutex would be too expensive in this hot path, and we already
have
* all the outcomes covered: if the backend becomes NULL right after
the check,
* vhost_transport_do_send_pkt() will check it under the mutex anyway.
+ *
+ * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
+ * The kick in vhost_vsock_start() will drain them on resume.
*/
if
(unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
- rcu_read_unlock();
- kfree_skb(skb);
] return -EHOSTUNREACH;
+ smp_rmb(); /* pairs with smp_wmb() in start/drop_backends
*/
+ if (!READ_ONCE(vsock->cpr_paused)) {
Can we avoid this which is not really readable and maybe add a single
variable to control the fast-fail at all?
I mean replacing both cpr_paused + backend-pointer with a single
`started` flag: set it to false at open, true on start via
smp_store_release(), back to false on normal stop, and leave it true
during CPR pause.
The reader in send_pkt can do just:
if (!smp_load_acquire(&vsock->started))
return -EHOSTUNREACH;
WDYT?
I don't think it's gonna work as suggested. As I understand, the order
during CPR migration is:
1) SET_RUNNING(0)
-> vhost_vsock_stop()
-> vhost_vsock_drop_backends()
2) RESET_OWNER
-> vhost_vsock_drop_backends()
3) SET_OWNER
4) SET_RUNNING(1)
-> vhost_vsock_start
-> for (...) vhost_vq_set_backend()
(Btw I just noticed backends are already NULL at step 2), but that's
just our CPR case, for any potential RESET_OWNER users it might not be
the case).
So the race windows starts from 1) (not from 2)). We have no way of
differentiating whether device is actually being stopped for good, or
we're in the middle of CPR. If we set the flag to false on stop as you
suggested, we'll still hit the -EHOSTUNREACH case eventually, and
avoiding it is the whole purpose of this patch.
The fast-fail with -EHOSTUNREACH relies on the presence of backends.
IIUC the backend will only become set after initial SET_RUNNING(1),
which will only happen once the guest driver writes smth to virtio
config register, QEMU catches it and calls SET_RUNNING(1). So we have
ordering with the guest's actions here, which is logical. But for our
issue that means that the only true marker of paused/not paused is the
presence of backends - and that's why the flag is set in
vhost_vsock_drop_backends().
Okay, so what about avoiding to set `started` to false in
SET_RUNNING(0)? I mean use it just to track the first SET_RUNNING(1).
(And maybe changing the name to that variable).
Apart from CPR, when can SET_RUNNING(0) occur?
At the end that was just an optimization, if we queue the packet is not
a big issue IMO.
+ rcu_read_unlock();
+ kfree_skb(skb);
+ return -EHOSTUNREACH;
+ }
That said claude here is reporting a potential issue that I think we
should consider:
After VHOST_RESET_OWNER, the guest CID stays in the hash, so
vhost_transport_send_pkt() can still find the vsock, skip the
fast-fail (cpr_paused=true), and call vhost_vq_work_queue() while
vhost_workers_free() is freeing workers without a synchronize_rcu()
— risking a use-after-free. Also, any send_pkt_work queued between
the last flush and worker teardown gets its VHOST_WORK_QUEUED
bit
stuck (the vhost task exits without draining), deadlocking
host→guest traffic after restart.
A synchronize_rcu() in vhost_workers_free() between the
rcu_assign_pointer(NULL) loop and the destroy loop would close the
use-after-free, and reinitializing send_pkt_work via
vhost_work_init() after vhost_dev_reset_owner() returns would clear
the stuck QUEUED bit.
Yes, this looks real indeed. Though I couldn't hit the UAF issue while
testing host->guest transfer under KASAN.
}
if (virtio_vsock_skb_reply(skb))
@@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
mutex_unlock(&vq->mutex);
}
+ smp_wmb(); /* pairs with smp_rmb() in send_pkt */
+ WRITE_ONCE(vsock->cpr_paused, false);
+
/* Some packets may have been queued before the device was started,
* let's kick the send worker to send them.
*/
@@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock
*vsock)
lockdep_assert_held(&vsock->dev.mutex);
+ if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
+ WRITE_ONCE(vsock->cpr_paused, true);
+ smp_wmb(); /* pairs with smp_rmb() in send_pkt */
+ }
Why here and not in vhost_vsock_reset_owner()?
Also having this here will set it to true also with
VHOST_VSOCK_SET_RUNNING(0), is that right?
That was added here precisely to cover the vhost_vsock_stop() case (see
above).
I see now, a comment or something in the commit would have helped.
Thanks,
Stefano