On Thu, Apr 23, 2026 at 09:16:28AM -0700, Dipayaan Roy wrote:
> During Function Level Reset recovery, the MANA driver reads
> hardware BAR0 registers that may temporarily contain garbage values.
> The SHM (Shared Memory) offset read from GDMA_REG_SHM_OFFSET is used
> to compute gc->shm_base, which is later dereferenced via readl() in
> mana_smc_poll_register(). If the hardware returns an unaligned or
> out-of-range value, the driver must not blindly use it, as this would
> propagate the hardware error into a kernel crash.
It is not what we are calling "hardening" if you are hitting actual
crashes in actual real systems.
"hardening" is the driver defending against actively malicious
hardware, operating in ways that will never be seen in real systems,
attempting to compromise the kernel.
Drivers working around real world broken/buggy/malfunctioning HW is
just entirely normal stuff.
> @@ -73,10 +74,25 @@ static int mana_gd_init_pf_regs(struct pci_dev *pdev)
> gc->phys_db_page_base = gc->bar0_pa + gc->db_page_off;
>
> sriov_base_off = mana_gd_r64(gc, GDMA_SRIOV_REG_CFG_BASE_OFF);
> + if (sriov_base_off >= gc->bar0_size ||
> + !IS_ALIGNED(sriov_base_off, sizeof(u32))) {
> + dev_err(gc->dev,
> + "SRIOV base offset 0x%llx out of range or unaligned
> (BAR0 size 0x%llx)\n",
> + sriov_base_off, (u64)gc->bar0_size);
> + return -EPROTO;
> + }
.. and if it is entirely normal and something that happens is EPROTO
really the right way to deal with this race, or should the driver be
looping somehow until the device stabilizes??
Jason