On Tue, Sep 02, 2025 at 06:26:59PM +0100, Anatoly Burakov wrote: > From: Jacob Keller <jacob.e.kel...@intel.com> > > The ice_cfg_tx_topo function attempts to apply Tx scheduler topology > configuration based on NVM parameters, selecting either a 5 or 9 layer > topology. > > As part of this flow, the driver acquires the "Global Configuration > Lock", which is a hardware resource associated with programming the DDP > package to the device. This "lock" is implemented by firmware as a way to > guarantee that only one PF can program the DDP for a device. Unlike a > traditional lock, once a PF has acquired this lock, no other PF will be > able to acquire it again (including that PF) until a core reset of the > device. Future requests to acquire the lock report that global > configuration has already completed. > > The following flow is used to program the Tx topology: > > * Read the DDP package for scheduler configuration data > * Acquire the global configuration lock > * Program Tx scheduler topology according to DDP package data > * Trigger a core reset which clears the global configuration lock > > This is followed by the flow for programming the DDP package: > > * Acquire the global configuration lock (again) > * Download the DDP package to the device > * Release the global configuration lock. > > However, if configuration of the Tx topology fails, (i.e. > ice_get_set_tx_topo() returns an error code), the driver exits > ice_cfg_tx_topo() immediately, and fails to trigger core reset. > > While the global configuration lock is held, the firmware rejects most > AdminQ commands, as it is waiting for the DDP package download (or Tx > scheduler topology programming) to occur. > > The current driver flows assume that the global configuration lock has > been reset after programming the Tx topology. Thus, the same PF attempts > to acquire the global lock again, and fails. This results in the driver > reporting "an unknown error occurred when loading the DDP package". It > then attempts to enter safe mode, but ultimately fails to finish > ice_probe() since nearly all AdminQ command report error codes, and the > driver stops loading the device at some point during its initialization. > > We cannot simply release the global lock after a failed call to > ice_get_set_tx_topo(). Releasing the lock indicates to firmware that > global configuration (downloading of the DDP) has completed. Future > attempts by this or other PFs to load the DDP will fail with a report > that the DDP package has already been downloaded. Then, PFs will enter > safe mode as they realize that the package on the device does not meet > the minimum version requirement to load. The reported error messages are > confusing, as they indicate the version of the default "safe mode" > package in the NVM, rather than the version of the DDP package loaded > from the filesystem. > > Instead, we need to trigger core reset to clear global configuration. > This is the lowest level of hardware reset which clears the global > configuration lock and related state. It also clears any already > downloaded DDP. Crucially, it does *not* clear the Tx scheduler topology > configuration. > > Refactor ice_cfg_tx_topo() to always trigger a core reset after acquiring > the global lock, regardless of success or failure of the topology > configuration. > > We need to re-initialize the HW structure when we trigger the core reset. > Previously, this was the responsibility of the core driver to cleanup > after the core reset. Instead, make it the responsibility of this > function. This avoids needless re-initialization for the cases where no > reset occurred. > > Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> > Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > --- > drivers/net/intel/ice/base/ice_ddp.c | 34 ++++++++++++++++++---------- > 1 file changed, 22 insertions(+), 12 deletions(-) >
Acked-by: Bruce Richardson <bruce.richard...@intel.com> See one comment inline below. > diff --git a/drivers/net/intel/ice/base/ice_ddp.c > b/drivers/net/intel/ice/base/ice_ddp.c > index 850c722a3f..68e75be4d2 100644 > --- a/drivers/net/intel/ice/base/ice_ddp.c > +++ b/drivers/net/intel/ice/base/ice_ddp.c > @@ -2370,7 +2370,7 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len) > struct ice_buf_hdr *section; > struct ice_pkg_hdr *pkg_hdr; > enum ice_ddp_state state; > - u16 i, size = 0, offset; > + u16 size = 0, offset; > u32 reg = 0; > int status; > u8 flags; > @@ -2457,25 +2457,35 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 > len) > /* check reset was triggered already or not */ > reg = rd32(hw, GLGEN_RSTAT); > if (reg & GLGEN_RSTAT_DEVSTATE_M) { > - /* Reset is in progress, re-init the hw again */ > ice_debug(hw, ICE_DBG_INIT, "Reset is in progress. layer > topology might be applied already\n"); > ice_check_reset(hw); > - return 0; > + /* Reset is in progress, re-init the hw again */ > + goto reinit_hw; > } > > /* set new topology */ > status = ice_get_set_tx_topo(hw, new_topo, size, NULL, NULL, true); > if (status) { > - ice_debug(hw, ICE_DBG_INIT, "Set tx topology is failed\n"); > - return status; > + ice_debug(hw, ICE_DBG_INIT, "Failed setting Tx topology, status > %d\n", > + status); > + status = ICE_ERR_CFG; > } > > - /* new topology is updated, delay 1 second before issuing the CORRER */ > - for (i = 0; i < 10; i++) > - ice_msec_delay(100, true); > + /* Even if Tx topology config failed, we need to CORE reset here to > + * clear the global configuration lock. Delay 1 second to allow > + * hardware to settle then issue a CORER > + */ > + ice_msec_delay(1000, true); > ice_reset(hw, ICE_RESET_CORER); > - /* CORER will clear the global lock, so no explicit call > - * required for release > - */ > - return 0; > + ice_check_reset(hw); > + > +reinit_hw: > + /* Since we triggered a CORER, re-initialize hardware */ > + ice_deinit_hw(hw); > + if (ice_init_hw(hw)) { > + ice_debug(hw, ICE_DBG_INIT, "Failed to re-init hardware after > setting Tx topology\n"); > + return ICE_ERR_RESET_FAILED; > + } > + > + return status; > } There is a similer deinit + init combo in ice_init_pkg in the same file, which is the only use of this function (in DPDK anyway). Therefore, I believe that that code can be removed now that the reinit is handled by the cfg_tx_topo() function.