Control: tag -1 moreinfo

You wrote:
> I've encountered an issue booting up Debian Testing with the latest
> kernel. The system reports something about kernel panic caused by
> aacraid module, then hangs completely during fsck step. Information
> about the installed system is in the attached .txt file. Note that
> hardware is different, because the SSD with Debian Testing has been
> extracted from the server for investigation.
> 
> Using the kernel parameters "pci=nocrs single" I was able to boot into
> emergency mode and see the error messages related to kernel panic.
> Photos attached.
> 
> Pulling the RAID controller out of the PCI-E slot restores normal boot.
[...]

There are (at least) 2 bugs here:

1. Something is preventing aacraid and ehci-hcd from allocating DMA
   buffers:
   "ehci-pci 0000:00:1a.0: init 0000:00:1a.0 fail, -12"
   "aacraid: unable to create mapping."

2. aacraid then double-frees a chunk of memory while handling the
   failure, causing the panic:
   "kernel BUG at mm/slub.c:448!"

I'm attaching a patch which should fix bug #2, which may help to get
more information about bug #1.

In principle you should be able to test this by following the
instructions at
<https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.4>
but currently the test-patches script has not been updated along with
the package and will take a lot more time and space than it should.

So instead I would suggest building a custom kernel based on the Debian
configuration, following the instructions at
<https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s-common-building>
and applying this patch before you run "make clean".

Let us know if you have any difficulty with this.  If the system is
able to boot with a patched kernel, please send the full kernel log.

Ben.

-- 
Ben Hutchings
Knowledge is power.  France is bacon.

From 6ef2851f75411b379868119e693ce63440dde869 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <b...@debian.org>
Date: Wed, 10 Jul 2024 18:41:07 +0200
Subject: [PATCH] aacraid: Fix double-free on probe failure

aac_probe_one() calls hardware-specific init functions through the
aac_driver_ident::init pointer, all of which eventually call down to
aac_init_adapter().

If aac_init_adapter() fails after allocating memory for
aac_dev::queues, it frees the memory but does not clear that member.

After the hardware-specific init function returns an error,
aac_probe_one() goes down an error path that frees the memory pointed
to by aac_dev::queues, resulting.in a double-free.

Reported-by: Michael Gordon <m.gordon.zelenobor...@gmail.com>
References: https://bugs.debian.org/1075855
Fixes: 8e0c5ebde82b ("[SCSI] aacraid: Newer adapter communication iterface support")
Signed-off-by: Ben Hutchings <b...@debian.org>
---
 drivers/scsi/aacraid/comminit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
index bd99c5492b7d..0f64b0244303 100644
--- a/drivers/scsi/aacraid/comminit.c
+++ b/drivers/scsi/aacraid/comminit.c
@@ -642,6 +642,7 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
 
 	if (aac_comm_init(dev)<0){
 		kfree(dev->queues);
+		dev->queues = NULL;
 		return NULL;
 	}
 	/*
@@ -649,6 +650,7 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
 	 */
 	if (aac_fib_setup(dev) < 0) {
 		kfree(dev->queues);
+		dev->queues = NULL;
 		return NULL;
 	}
 		

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to