On Wed, 23 Jul 2025 18:34:04 +0500 Khadem Ullah <14pwcse1...@uetpeshawar.edu.pk> wrote:
> Hi Ivan, agree. I think we can atleast currently guard all the known > crashes. > > Sure, I will check the macro and get back to you. > > Thank you! > > On Wed, Jul 23, 2025, 18:19 Ivan Malov <ivan.ma...@arknetworks.am> wrote: > > > Hi Khadem, > > > > On Wed, 23 Jul 2025, Khadem Ullah wrote: > > > > > In secondary processes, directly accessing 'dev->data->dev_private' can > > > cause a segmentation fault if the primary process has exited or if the > > > shared memory is no longer accessible. > > > > > > Secondary application not only breaking on device closing, > > > but also getting segfault when we do "show device info all" from > > secondary > > > after primary closes. > > > > > > This patch adds safety checks while using rte_mem_virt2phy(), with an > > > unlikely() branch hint to minimize performance impact in the fast path. > > > This ensures 'dev_private' is still valid before accessing it. > > > > > > Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return > > int") > > > Cc: sta...@dpdk.org > > > > > > Signed-off-by: Khadem Ullah <14pwcse1...@uetpeshawar.edu.pk> > > > --- > > > lib/ethdev/rte_ethdev.c | 15 ++++++++++++++- > > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > > > index dd7c00bc94..343e156a4f 100644 > > > --- a/lib/ethdev/rte_ethdev.c > > > +++ b/lib/ethdev/rte_ethdev.c > > > @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct > > rte_eth_dev_info *dev_info) > > > > > > if (dev->dev_ops->dev_infos_get == NULL) > > > return -ENOTSUP; > > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > > + unlikely(rte_mem_virt2phy(dev->data->dev_private) == > > RTE_BAD_PHYS_ADDR)) { > > > + RTE_ETHDEV_LOG_LINE(ERR, > > > + "Secondary: dev_private not accessible (primary > > exited?)"); > > > + rte_errno = ENODEV; > > > + return -rte_errno; > > > + } > > > diag = dev->dev_ops->dev_infos_get(dev, dev_info); > > > if (diag != 0) { > > > /* Cleanup already filled in device information */ > > > @@ -4307,7 +4314,13 @@ rte_eth_macaddr_get(uint16_t port_id, struct > > rte_ether_addr *mac_addr) > > > port_id); > > > return -EINVAL; > > > } > > > - > > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > > + (dev->data->mac_addrs == NULL)) { > > > + RTE_ETHDEV_LOG_LINE(ERR, > > > + "Secondary: dev_private not accessible (primary > > exited?)"); > > > + rte_errno = ENODEV; > > > + return -rte_errno; > > > + } > > > rte_ether_addr_copy(&dev->data->mac_addrs[0], mac_addr); > > > > > > rte_eth_trace_macaddr_get(port_id, mac_addr); > > > > I see one more API has been augmented with the check. But community > > members may > > still argue this is not robust, as many other APIs will also fail. So, > > even if > > the task was to augment as many APIs as possible with the check, then the > > check > > would still be required to be factorised/generalised somehow. What do you > > think? > > > > Please also note that there are already macro invocations in many of these > > APIs, > > for example, RTE_ETH_VALID_PORTID_OR_ERR_RET. Could be convenient. > > > > Thank you. > > > > > -- > > > 2.43.0 > > > > > > > > No top posting. How are you monitoring the primary? Lets fix that