On Tue, May 19, 2026 at 03:55:56PM -0700, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> net: mana: Add NULL guards in teardown path to prevent panic on attach failure
>     
> When queue allocation fails partway through, the error cleanup frees
> and NULLs apc->tx_qp and apc->rxqs. The patch adds NULL guards for these
> pointers in teardown paths such as mana_destroy_vport(), mana_fence_rqs(),
> and mana_dealloc_queues() to prevent NULL pointer dereferences.
> 
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c 
> > b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index 9afc786b297a8..9e7e4bf526bf6 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> [ ... ]
> > @@ -2858,13 +2861,16 @@ static void mana_destroy_vport(struct 
> > mana_port_context *apc)
> >     struct mana_rxq *rxq;
> >     u32 rxq_idx;
> >  
> > -   for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> > -           rxq = apc->rxqs[rxq_idx];
> > -           if (!rxq)
> > -                   continue;
> > +   if (apc->rxqs) {
> >  
> > -           mana_destroy_rxq(apc, rxq, true);
> > -           apc->rxqs[rxq_idx] = NULL;
> > +           for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> > +                   rxq = apc->rxqs[rxq_idx];
> > +                   if (!rxq)
> > +                           continue;
> > +
> > +                   mana_destroy_rxq(apc, rxq, true);
> > +                   apc->rxqs[rxq_idx] = NULL;
> > +           }
> >     }
> 
> If the teardown path frees apc->rxqs and sets it to NULL on attach failure,
> how does the interface recover if the user subsequently brings it back up?
> 
> When mana_open() is invoked, it directly calls mana_alloc_queues(). Inside
> mana_alloc_queues() -> mana_add_rx_queues(), the code does:
> 
>     apc->rxqs[i] = rxq;
> 
> If apc->rxqs was left as NULL during the teardown, will this cause a NULL
> pointer dereference on the next interface bring-up?
> 
> [ ... ]
Hi Jakub,

The only path that recovers from this state is mana_attach(), which
calls mana_init_port() -> mana_init_port_context() to re-allocate
apc->rxqs before calling mana_alloc_queues().

When mana_open is invoked prior to it the rxqs would be already setup by
mana_probe_port -> mana_init_port -> mana_init_port_context.

> > @@ -3287,33 +3294,38 @@ static int mana_dealloc_queues(struct net_device 
> > *ndev)
> >      * number of queues.
> >      */
> >  
> > -   for (i = 0; i < apc->num_queues; i++) {
> > -           txq = &apc->tx_qp[i].txq;
> > -           tsleep = 1000;
> > -           while (atomic_read(&txq->pending_sends) > 0 &&
> > -                  time_before(jiffies, timeout)) {
> > -                   usleep_range(tsleep, tsleep + 1000);
> > -                   tsleep <<= 1;
> > -           }
> > -           if (atomic_read(&txq->pending_sends)) {
> > -                   err = pcie_flr(to_pci_dev(gd->gdma_context->dev));
> > -                   if (err) {
> > -                           netdev_err(ndev, "flr failed %d with %d pkts 
> > pending in txq %u\n",
> > -                                      err, 
> > atomic_read(&txq->pending_sends),
> > -                                      txq->gdma_txq_id);
> > +   if (apc->tx_qp) {
> 
> In the error path where queue creation fails partway through, will
> mana_destroy_txq() be called?
> 
> If mana_destroy_txq() unconditionally iterates over all apc->num_queues:
> 
>     for (i = 0; i < apc->num_queues; i++) {
>         ...
>         mana_destroy_wq_obj(apc, GDMA_SQ, apc->tx_qp[i].tx_object);
>         ...
> 
> Will it pass 0 or INVALID_MANA_HANDLE to mana_destroy_wq_obj() for queues
> that were never fully initialized?
> 
> Does mana_destroy_wq_obj() send a MANA_DESTROY_WQ_OBJ command to the hardware
> with these uninitialized handles, and could handle 0 corrupt the firmware
> state?
>
This is a known  exisiting issue in the cleaup path for partial tx
setups and was also mentioned in the a recent patch where the rx partial
init clean-up path was fixed. My colleague Aditya is already working on a
patch to fix this and all other issues in the tx cleanup path.
 
> > +           for (i = 0; i < apc->num_queues; i++) {
> > +                   txq = &apc->tx_qp[i].txq;
> > +                   tsleep = 1000;
> > +                   while (atomic_read(&txq->pending_sends) > 0 &&
> > +                          time_before(jiffies, timeout)) {
> > +                           usleep_range(tsleep, tsleep + 1000);
> > +                           tsleep <<= 1;
> > +                   }
> > +                   if (atomic_read(&txq->pending_sends)) {
> > +                           err =
> > +                               pcie_flr(to_pci_dev(gd->gdma_context->dev));
> 
> If pcie_flr() is triggered here, the PCI Function Level Reset wipes the
> internal GDMA state.
> 
> Further down in this function, mana_disable_vport_rx(), mana_fence_rqs(),
> and mana_destroy_vport() are called. Will these functions attempt to send
> commands to the now-destroyed GDMA admin queue via mana_send_request() and
> wait for completions?
> 
> Could this lead to cascading timeouts since the hardware was just reset and
> cannot process admin commands?
> 
This is pre-existing behavior/issue the pcie_flr() path and the
subsequent calls to 
mana_disable_vport_rx()/mana_fence_rqs()/mana_destroy_vport()
are unchanged by this patch. My patch only adds a NULL guard for
apc->tx_qp to skip this entire TX drain block when tx_qp is already
freed. But this point seems to be valid issue, I will investigate this
further and I will send a different patch set for this as follow up work.


> > +                           if (err) {
> > +                                   netdev_err(ndev, "flr failed %d with %d 
> > pkts pending in txq %u\n",
> > +                                              err,
> > +                                       atomic_read(&txq->pending_sends),
> > +                                       txq->gdma_txq_id);
> > +                           }
> > +                           break;
> >                     }
> > -                   break;
> >             }
> -- 
> pw-bot: cr

Regards
Dipayaan Roy


Reply via email to