On 2/26/26 5:24 PM, David Marchand wrote:
> By default, DPDK probes all available resources (like PCI devices) and
> partially initialises them (/ takes over them).
> This behavior has been relied on by OVS, since netdev-dpdk introduction.
> It is not needed since DPDK device hotplug has been supported and used
> for some time now.
>
> Besides, this initial probing may not be desirable:
> - for PCI devices bound to vfio-pci, the first application taking over
> them "wins", meaning that OVS would prevent qemu from using some VF
> devices,
> - for mlx5 devices,
> - the driver maintains link status and liveness of all ports
> (taking some kernel lock) even when OVS only uses a subset of them,
> - if some driver feature needs to be enabled for one port via a devargs,
> this would have to be set in dpdk-extra,
>
> Change this behavior and disable the initial PCI probing by passing
> a specially crafted allow list: this implementation is not elegant
> but it has been successfully used (for the PCI part) in a number of
> setups I know, and there is no better DPDK API to achieve the same
> at the moment.
>
> This behavior change breaks setups that were using the
> class=eth,mac=XX:XX:XX:XX:XX:XX syntax because OVS was relying on the
> (fragile) assumption that all DPDK ports were probed at init once and
> for all.
> Add a warning for users of this syntax, update the documentation and
> add an option to restore the original behavior via
> 'dpdk-probe-at-init=true'.
>
> This option also helps for unexpected cases like https://xkcd.com/1172/.
>
> Signed-off-by: David Marchand <[email protected]>
> Acked-by: Eli Britstein <[email protected]>
> Acked-by: Eelco Chaudron <[email protected]>
Hi David. Thanks for this. A few comments below.
thanks,
Kevin.
> ---
> Changes since RFC v2:
> - updated descriptions and comments,
>
> Changes since RFC v1:
> - updated commitlog (mentionning devargs),
> - handled other DPDK buses,
>
> ---
> Documentation/howto/dpdk.rst | 6 +++++
> Documentation/intro/install/dpdk.rst | 8 ++++++
> NEWS | 5 ++++
> lib/dpdk.c | 28 +++++++++++++++++++
> lib/netdev-dpdk.c | 2 +-
> tests/system-dpdk-macros.at | 2 +-
> tests/system-dpdk.at | 40 ++++++++++++++--------------
> vswitchd/vswitch.xml | 15 +++++++++++
> 8 files changed, 84 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 73e630b07f..5d6bf94cdb 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -62,6 +62,12 @@ is suggested::
>
> .. important::
>
> + Using this syntax requires that DPDK probes the device that owns those
> + multiple ports. This can be achieved by either setting an allowlist
> + of PCI devices in the ``dpdk-extra`` configuration, or by requesting that
> + all available devices (including PCI devices) be probed at initialization
> + (setting ``dpdk-probe-at-init`` to true).
> +
> Hotplugging physical interfaces is not supported using the above syntax.
> This is expected to change with the release of DPDK v18.05. For
> information
> on hotplugging physical interfaces, you should instead refer to
> diff --git a/Documentation/intro/install/dpdk.rst
> b/Documentation/intro/install/dpdk.rst
> index 6f4687bdea..eabca63a83 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -297,6 +297,14 @@ listed below. Defaults will be provided for all values
> not explicitly set.
> sockets. If not specified, this option will not be set by default. DPDK
> default will be used instead.
>
> +``dpdk-probe-at-init``
> + Let DPDK EAL probe all available devices at initialization.
> + This consumes more resources, as OVS may not use all probed devices, and it
> + may cause undesired side effects (such as taking the RTNL lock frequently
> for
> + maintaining link status (and other states, etc.) of mlx5 netdevs that OVS
> + does not care about). However, this option is needed when using the
> + ``class=eth,mac=XX:XX:XX:XX:XX:XX`` syntax for DPDK ports.
> +
> ``dpdk-hugepage-dir``
> Directory where hugetlbfs is mounted
>
> diff --git a/NEWS b/NEWS
> index d5642f9857..392e3ed1f0 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,5 +1,10 @@
> Post-v3.7.0
> --------------------
> + - DPDK:
> + * Probing of devices at DPDK initialization has been disabled to avoid
> + wasting resources on unused devices. This breaks DPDK netdev ports
> + using the "class=eth,mac=" syntax (though it can be restored, see
> + Documentation/howto/dpdk.rst).
>
>
> v3.7.0 - 16 Feb 2026
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index d27b95cd9a..794ffbe599 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -430,6 +430,34 @@ dpdk_init__(const struct smap *ovs_other_config)
> svec_add_nocopy(&args, xasprintf("0@%d", cpu));
> }
>
> + if (!args_contains(&args, "-a") && !args_contains(&args, "-b")
Should also add check for --allow and --block
> + && !smap_get_bool(ovs_other_config, "dpdk-probe-at-init", false)) {
> +#ifdef RTE_BUS_AUXILIARY
> + svec_add(&args, "-a");
> + svec_add(&args, "auxiliary:");
> +#endif
> +#ifdef RTE_BUS_CDX
> + svec_add(&args, "-a");
> + svec_add(&args, "cdx:cdx-");
> +#endif
> +#ifdef RTE_BUS_FSLMC
> + svec_add(&args, "-a");
> + svec_add(&args, "fslmc:dpni.65535");
> +#endif
> +#ifdef RTE_BUS_PCI
> + svec_add(&args, "-a");
> + svec_add(&args, "pci:0000:00:00.0");
I tried with adding dpdk-extra="--no-pci". It is handled correctly in
rte_eal_init() but could consider adding a check here to avoid having
both "no pci" and "allow dummy pci device" in the args ?
> +#endif
> +#ifdef RTE_BUS_UACCE
> + svec_add(&args, "-a");
> + svec_add(&args, "uacce:");
> +#endif
> +#ifdef RTE_BUS_VMBUS
> + svec_add(&args, "-a");
> + svec_add(&args, "vmbus:00000000-0000-0000-0000-000000000000");
> +#endif
> + }
> +
> svec_terminate(&args);
>
> optind = 1;
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index c51fe7c258..8115223277 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2050,7 +2050,7 @@ netdev_dpdk_get_port_by_mac(const char *mac_str, char
> **extra_err)
> }
> }
>
> - *extra_err = xstrdup("unknown mac");
> + *extra_err = xstrdup("unknown mac (use dpdk-probe-at-init=true?)");
nit: I really hope no-one would take you literally but just in case,
might be worth putting a space between "true" and "?"
> return DPDK_ETH_PORT_ID_INVALID;
> }
>
> diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at
> index f8ba766739..716d8a357d 100644
> --- a/tests/system-dpdk-macros.at
> +++ b/tests/system-dpdk-macros.at
> @@ -139,7 +139,7 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
> OVS_DPDK_PRE_CHECK()
> OVS_WAIT_WHILE([ip link show ovs-netdev])
> dnl For functional tests, no need for DPDK PCI probing.
> - OVS_DPDK_START([--no-pci], [--disable-system], [$3])
> + OVS_DPDK_START([], [--disable-system], [$3])
> dnl Add bridges, ports, etc.
> OVS_WAIT_WHILE([ip link show br0])
> AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [|
> uuidfilt])], [0], [$2])
> diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
> index 17d3d25955..bd1bead661 100644
> --- a/tests/system-dpdk.at
> +++ b/tests/system-dpdk.at
> @@ -43,7 +43,7 @@ dnl Check if EAL init is successful
> AT_SETUP([OVS-DPDK - EAL init])
> AT_KEYWORDS([dpdk])
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
> AT_CHECK([grep "DPDK Enabled - initializing..." ovs-vswitchd.log], [],
> [stdout])
> AT_CHECK([grep "EAL" ovs-vswitchd.log], [], [stdout])
> AT_CHECK([grep "DPDK Enabled - initialized" ovs-vswitchd.log], [], [stdout])
> @@ -59,7 +59,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - single])
> AT_KEYWORDS([dpdk])
> OVS_DPDK_PRE_CHECK()
> OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
> CHECK_CPU_DISCOVERED()
> AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-lcore-mask=0x1])
> AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-init=true])
> @@ -77,7 +77,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - multi])
> AT_KEYWORDS([dpdk])
> OVS_DPDK_PRE_CHECK()
> OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
> CHECK_CPU_DISCOVERED(4)
> AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-lcore-mask=0xf])
> AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-init=true])
> @@ -95,7 +95,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion -
> non-contig])
> AT_KEYWORDS([dpdk])
> OVS_DPDK_PRE_CHECK()
> OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
> CHECK_CPU_DISCOVERED(8)
> AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-lcore-mask=0xca])
> AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-init=true])
> @@ -113,7 +113,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion -
> zeromask])
> AT_KEYWORDS([dpdk])
> OVS_DPDK_PRE_CHECK()
> OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
> AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-lcore-mask=0x0])
> AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-init=true])
> OVS_WAIT_UNTIL([grep "Ignoring database defined option 'dpdk-lcore-mask' due
> to invalid value '0x0'" ovs-vswitchd.log])
> @@ -152,7 +152,7 @@ dnl Add vhost-user-client port
> AT_SETUP([OVS-DPDK - add vhost-user-client port])
> AT_KEYWORDS([dpdk])
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -181,7 +181,7 @@ AT_SETUP([OVS-DPDK - ping vhost-user ports])
> AT_KEYWORDS([dpdk])
> OVS_DPDK_PRE_CHECK()
> OVS_DPDK_CHECK_TESTPMD()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -237,7 +237,7 @@ AT_SETUP([OVS-DPDK - ping vhost-user-client ports])
> AT_KEYWORDS([dpdk])
> OVS_DPDK_PRE_CHECK()
> OVS_DPDK_CHECK_TESTPMD()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -350,7 +350,7 @@ AT_SETUP([OVS-DPDK - Ingress policing create delete vport
> port])
> AT_KEYWORDS([dpdk])
>
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS and add ingress policer
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -387,7 +387,7 @@ AT_SETUP([OVS-DPDK - Ingress policing no policing rate])
> AT_KEYWORDS([dpdk])
>
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS and add ingress policer
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -421,7 +421,7 @@ AT_SETUP([OVS-DPDK - Ingress policing no policing burst])
> AT_KEYWORDS([dpdk])
>
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS and add ingress policer
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -487,7 +487,7 @@ AT_SETUP([OVS-DPDK - QoS create delete vport port])
> AT_KEYWORDS([dpdk])
>
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS and add egress policer
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -522,7 +522,7 @@ AT_SETUP([OVS-DPDK - QoS no cir])
> AT_KEYWORDS([dpdk])
>
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS and add egress policer
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -551,7 +551,7 @@ AT_SETUP([OVS-DPDK - QoS no cbs])
> AT_KEYWORDS([dpdk])
>
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS and add egress policer
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -657,7 +657,7 @@ AT_KEYWORDS([dpdk])
>
> OVS_DPDK_CHECK_TESTPMD()
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS with default MTU value
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -698,7 +698,7 @@ AT_KEYWORDS([dpdk])
>
> OVS_DPDK_CHECK_TESTPMD()
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS and modify MTU value
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -816,7 +816,7 @@ AT_KEYWORDS([dpdk])
>
> OVS_DPDK_CHECK_TESTPMD()
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS and set MTU value to max upper
> bound
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -858,7 +858,7 @@ AT_KEYWORDS([dpdk])
>
> OVS_DPDK_CHECK_TESTPMD()
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>
> dnl Add userspace bridge and attach it to OVS and set MTU value to min lower
> bound
> AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -897,7 +897,7 @@ AT_SETUP([OVS-DPDK - user configured mempool])
> AT_KEYWORDS([dpdk])
> OVS_DPDK_PRE_CHECK()
> OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>
> AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
> other_config:shared-mempool-config=8000,6000,1500])
> AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-init=true])
> @@ -946,7 +946,7 @@ dnl
> --------------------------------------------------------------------------
> AT_SETUP([OVS-DPDK - ovs-appctl dpif/offload/show])
> AT_KEYWORDS([dpdk dpif-offload])
> OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
> AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
> AT_CHECK([ovs-vsctl add-port br0 p1 \
> -- set Interface p1 type=dpdk options:dpdk-devargs=net_null0,no-rx=1],
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index b7a5afc0a5..458b88870c 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -453,6 +453,21 @@
> </p>
> </column>
>
> + <column name="other_config" key="dpdk-probe-at-init"
> + type='{"type": "boolean"}'>
> + <p>
> + Specifies whether DPDK should probe all devices available at the
> + time DPDK is initialized. This is required when declaring DPDK
> ports
> + using the "class=eth,mac=XX:XX:XX:XX:XX:XX" syntax, but beware that
> + it implies higher resource consumption and may cause undesired side
> + effects with some devices (such as mlx5).
Ilya already commented, but +1 for expanding on the undesired side
effects. e.g. "undesired side-effects, such additional interrupt
handling and link status checks for unused devices"
> + </p>
> + <p>
> + If not specified, DPDK will not probe any devices at
> initialization,
> + which should be fine in most cases.
> + </p>
> + </column>
> +
> <column name="other_config" key="dpdk-extra"
> type='{"type": "string"}'>
> <p>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev