On Tue, 22 Jul 2025 23:05:06 +0400 (+04)
Ivan Malov <ivan.ma...@arknetworks.am> wrote:

> There is a difference between control path and data path. Always has been. 
> Yes,
> on data path, DPDK has historically sought better performance, but on the slow
> path, checks have typically been implemented, even in the flow API, with the
> only exception being "asynchronous flow" APIs, which are meant to be 
> fast-path.
> 
> Yes, the idea to have a "secondary process reference counter" in 'rte_device'
> to be either guarded with its own lock or accessed atomically by 
> 'rte_dev_probe'
> and 'rte_dev_remove' (to increment and decrement/check respectively) as well 
> as
> by 'rte_eth_dev_close' and 'rte_eth_dev_reset' (to decrement/check) may not be
> a hill to die on, to be honest, and might be wrong, so I have no strong 
> opinion.
> 
> What scares me most in this idea is that, one may still end up with certain
> entry points overlooked, rendering the whole effort worthless.
> 

Please don't top post.

The DPDK control has (up to now) assumed that control operations are only
done from a single thread on each port. There is also the issue of hotplug
but that is separate. For example, if two threads start and stop the
same port bad thing happen and NIC driver's break. 

This is not well documented and a section needs to go into programmer's guide
thread safety. The whole thread safety section is out of date, and doesn't
reference RCU when it should. It also doesn't cover hot plug or weird secondary
processes that fork.

There is also the issue of how primary/secondary monitoring work.
Right now the secondary monitors primary by periodically polling a lock file.
This inherently a racy method and leads to problems. It needs to be redesigned
to use a blocking method something like spawning a thread in secondary
that uses some part of the existing Unix domain IPC to get notification
when primary crashes or wants to exit. Ideally it would support synchronous
handshake with all primaries and asynchronous case when primary crashes.

The point is that bandaid's in the ethdev layer won't fix it well
enough.



Reply via email to