----------------------------------------------------------------- 1. Summary -----------------------------------------------------------------
My 2.0.0 kernel using the aic7xxx driver corrupts the superblock of my root filesystem (panics at boot, thinking it is an MS-DOS fs). My 1.2.13 kernel, using aha274x doesn't corrupt it. I'd like to know whether this corresponds to a problem fixed in 2.0.n (for n>0), or what I should do now. ----------------------------------------------------------------- Contents of this post: 1. Summary (above) 2. Hardware/software setup 3. Fuller description of the problem. 4. Extracts from /var/adm/messages ----------------------------------------------------------------- ----------------------------------------------------------------- 2. Hardware/software setup ----------------------------------------------------------------- 486DX-2/66, ASUS VLB Adaptec 2842 scsi controller Conner CFP1080S scsi disk Toshiba scsi CDROM model: XM-3501TA HP scsi tape: Model: HP35470A Teac 1.44 floppy Two kernels: 1.2.13, built using aha274x (aha274x.h v1.11, aha274x.c v1.29) custom-built from debian 0.93R6 2.0.0, built using aic7xxx (aic7xxx.h v3.1, aic7xxx.c v3.2) custom-built from debian 1.1 The kernel is configured for scsi, scsi disk, scsi tape, and scsi cdrom support, as well as the appropriate driver. ISO9660 fs support is loaded as a module. ----------------------------------------------------------------- 3. Fuller description of problem ----------------------------------------------------------------- I upgraded my system from debian 0.93R6 to debian 1.1 (except for the distribution kernel, which wouldn't boot). I then built a custom kernel for my machine, including support for my controller (aic7xxx). I ran this several times over about four days, and then began to make installation disks (for another machine), using "dd" to write files from my cdrom to the floppies. While "dd" was writing the fourth disk, I had a kernel panic, which I wrote down (figuring it wouldn't necessarily get written to /var/adm/messages): aic7xxx (aic7xxx_isr) BRKADRINT error (0xff): Illegal Host Access Illegal Sequencer Address referenced Illegal Opcode in sequencer program Sequencer RAM parity error Kernel panic aic7xxx: (aic7xxx_isr) Sure enough, it wasn't written to /var/adm/messages, although there is a message (also echoed to the terminal) each time the cdrom is mounted (didn't happen under the old kernel): ISO9660 Extensions: RRIP_1991A At this point, I had to do a hard reboot and let fsck fix any damage. I poked around for clues, and then wrote the remaining installations disks. I then got another message: ISO9660 Extensions: RRIP_1991A scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: 0x08 01 37 3a 08 0 Current error sr0b:00: sns = f0 3 ASC=15 ASCQ= 0 Raw sense data:0xf0 0x00 0x03 0x17 0x46 0x01 0x00 0x0a 0x00 0x00 0x00 0x00 0x15 0x00 0x00 0x00 CD-ROM I/O error: dev 0b:00, sector 318696 At this point, I rebooted with "shutdown ..." and found that my root filesystem could not be mounted. I used my emergency disks to run fsck, and found that the superblock was corrupted. I used an alternate superblock, and the problem (bad free block count) was fixed. At this point I found I could boot repeatedly with my old (1.2.13, aha274x) kernel, with no apparent ill effects. However, if I booted with my new (2.0.0, aic7xxx) kernel, I would boot successfully, but the *next* boot would stop at the partition check, finding the root filesystem unmountable (and a message suggesting that it is an MS-DOS fs --- I have no MS-DOS fs!). The message begins: [MS-DOS FS Rel.12, ... I tried passing "aic7xxx=extended,no_reset" at boot time. Although this skipped the scsi bus reset, it didn't seem to solve the problem. ----------------------------------------------------------------- 4. Extracts from /var/adm/messages ----------------------------------------------------------------- Here's what the aic7xxx kernel boot sequence looks like. The "cannot find map" message seems to be because the custom kernel System.map is not in /boot (I just noticed that it *is* in the /usr/src/kernel... directory). ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Aug 3 13:54:06 riel syslogd 1.3-0#6: restart. Aug 3 13:54:07 riel kernel: klogd 1.3-0, log source = /proc/kmsg started. Aug 3 13:54:07 riel syslogd 1.3-0#6: restart. Aug 3 13:54:07 riel kernel: Cannot find map file. Aug 3 13:54:07 riel kernel: Console: 16 point font, 400 scans Aug 3 13:54:07 riel kernel: Console: colour VGA+ 80x25, 1 virtual console (max 63) Aug 3 13:54:07 riel kernel: Calibrating delay loop.. ok - 33.18 BogoMIPS Aug 3 13:54:07 riel kernel: Memory: 14876k/16384k available (656k kernel code, 384k reserved, 468k data) Aug 3 13:54:07 riel kernel: This processor honours the WP bit even when in supervisor mode. Good. Aug 3 13:54:07 riel kernel: Swansea University Computer Society NET3.035 for Linux 2.0 Aug 3 13:54:07 riel kernel: NET3: Unix domain sockets 0.12 for Linux NET3.035. Aug 3 13:54:07 riel kernel: Swansea University Computer Society TCP/IP for NET3.034 Aug 3 13:54:07 riel kernel: IP Protocols: ICMP, UDP, TCP Aug 3 13:54:07 riel kernel: Checking 386/387 coupling... Ok, fpu using exception 16 error reporting. Aug 3 13:54:07 riel kernel: Checking 'hlt' instruction... Ok. Aug 3 13:54:07 riel kernel: Linux version 2.0.0 ([EMAIL PROTECTED]) (gcc version 2.7.2) #1 Tue Jul 30 19:35:51 PDT 1996 Aug 3 13:54:07 riel kernel: Serial driver version 4.13 with no serial options enabled Aug 3 13:54:07 riel kernel: tty00 at 0x03f8 (irq = 4) is a 16550A Aug 3 13:54:07 riel kernel: tty01 at 0x02f8 (irq = 3) is a 16550A Aug 3 13:54:07 riel kernel: Ramdisk driver initialized : 16 ramdisks of 4096K size Aug 3 13:54:07 riel kernel: Floppy drive(s): fd0 is 1.44M Aug 3 13:54:07 riel kernel: Started kswapd v 1.4.2.2 Aug 3 13:54:07 riel kernel: FDC 0 is a post-1991 82077 Aug 3 13:54:07 riel kernel: aic7xxx: Reading SEEPROM...done. Aug 3 13:54:07 riel kernel: aic7xxx: Extended translation disabled. Aug 3 13:54:07 riel kernel: aic7xxx: AHA-2840 Rev E and subsequent. Aug 3 13:54:07 riel kernel: aic7xxx: Using 4 SCB's after checking for SCB memory. Aug 3 13:54:07 riel kernel: aic7xxx: Using level sensitive interrupts. Aug 3 13:54:07 riel kernel: AHA-2840 AT VLB SLOT 1: Aug 3 13:54:07 riel kernel: irq 11 Aug 3 13:54:07 riel kernel: bus release time 40 bclks Aug 3 13:54:07 riel kernel: data fifo threshold 100% Aug 3 13:54:07 riel kernel: SCSI CHANNEL A: Aug 3 13:54:07 riel kernel: scsi id 7 Aug 3 13:54:07 riel kernel: scsi selection timeout 256 ms Aug 3 13:54:07 riel kernel: scsi bus reset at power-on enabled Aug 3 13:54:07 riel kernel: scsi bus parity enabled Aug 3 13:54:07 riel kernel: scsi bus termination (low byte) enabled Aug 3 13:54:07 riel kernel: aic7xxx: Downloading sequencer code...done. Aug 3 13:54:07 riel kernel: aic7xxx: Resetting the SCSI bus...done. Aug 3 13:54:07 riel kernel: scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 3.2/3.1/3.0 Aug 3 13:54:07 riel kernel: scsi : 1 host. Aug 3 13:54:07 riel kernel: aic7xxx: Scanning channel A for devices. Aug 3 13:54:07 riel kernel: aic7xxx: Target 0, channel A, now synchronous at 4.0MHz, offset(0xf). Aug 3 13:54:07 riel kernel: Vendor: TOSHIBA Model: CD-ROM XM-3501TA Rev: 3384 Aug 3 13:54:07 riel kernel: Type: CD-ROM ANSI SCSI revision: 02 Aug 3 13:54:07 riel kernel: Detected scsi CD-ROM sr0 at scsi0, channel 0, id 0, lun 0 Aug 3 13:54:07 riel kernel: aic7xxx: Target 3, channel A, now synchronous at 5.0MHz, offset(0xf). Aug 3 13:54:07 riel kernel: Vendor: HP Model: HP35470A Rev: T503 Aug 3 13:54:07 riel kernel: Type: Sequential-Access ANSI SCSI revision: 02 Aug 3 13:54:07 riel kernel: Detected scsi tape st0 at scsi0, channel 0, id 3, lun 0 Aug 3 13:54:07 riel kernel: aic7xxx: Target 6, channel A, now synchronous at 10.0MHz, offset(0xf). Aug 3 13:54:07 riel kernel: Vendor: CONNER Model: CFP1080S Rev: 3939 Aug 3 13:54:07 riel kernel: Type: Direct-Access ANSI SCSI revision: 02 Aug 3 13:54:07 riel kernel: Detected scsi disk sda at scsi0, channel 0, id 6, lun 0 Aug 3 13:54:07 riel kernel: scsi : detected 1 SCSI tape 1 SCSI cdrom 1 SCSI disk total. Aug 3 13:54:07 riel kernel: SCSI device sda: hdwr sector= 512 bytes. Sectors= 2110812 [1030 MB] [1.0 GB] Aug 3 13:54:07 riel kernel: Partition check: Aug 3 13:54:07 riel kernel: sda: sda1 sda2 sda3 < sda5 sda6 sda7 > sda4 Aug 3 13:54:07 riel kernel: VFS: Mounted root (ext2 filesystem) readonly. Aug 3 13:54:07 riel kernel: Adding Swap: 32764k swap-space Aug 3 13:54:07 riel kernel: CSLIP: code copyright 1989 Regents of the University of California Aug 3 13:54:07 riel kernel: PPP: version 2.2.0 (dynamic channel allocation) Aug 3 13:54:07 riel kernel: PPP Dynamic channel allocation code copyright 1995 Caldera, Inc. Aug 3 13:54:07 riel kernel: PPP line discipline registered. Aug 3 13:54:07 riel kernel: lp1 at 0x0378, (polling) Aug 3 14:03:04 riel syslogd: exiting on signal 15 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Here's what the boot for the aha274x looks like. There's also a message about "cannot find map" but I can't fix this one as easily because I attempted to rebuild this kernel with my new elf-ized setup. The build failed (maybe 1.2.13 can't be elf-ish), but I had cleaned up the area (make mrproper). ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Aug 3 14:56:05 riel syslogd 1.3-0#6: restart. Aug 3 14:56:05 riel kernel: klogd 1.3-0, log source = /proc/kmsg started. Aug 3 14:56:06 riel syslogd 1.3-0#6: restart. Aug 3 14:56:06 riel kernel: Cannot find map file. Aug 3 14:56:06 riel kernel: Console: colour EGA+ 80x25, 1 virtual console (max 63) Aug 3 14:56:06 riel kernel: Calibrating delay loop.. ok - 33.55 BogoMips Aug 3 14:56:06 riel kernel: Serial driver version 4.11 with no serial options enabled Aug 3 14:56:06 riel kernel: tty00 at 0x03f8 (irq = 4) is a 16550A Aug 3 14:56:06 riel kernel: tty01 at 0x02f8 (irq = 3) is a 16550A Aug 3 14:56:06 riel kernel: Floppy drive(s): fd0 is 1.44M Aug 3 14:56:06 riel kernel: FDC 0 is a post-1991 82077 Aug 3 14:56:06 riel kernel: aha274x: extended translation disabled Aug 3 14:56:06 riel kernel: AHA284X AT SLOT 1: Aug 3 14:56:06 riel kernel: irq 11 Aug 3 14:56:06 riel kernel: bus release time 40 bclks Aug 3 14:56:06 riel kernel: data fifo threshold 100% Aug 3 14:56:06 riel kernel: SCSI CHANNEL A: Aug 3 14:56:06 riel kernel: scsi id 7 Aug 3 14:56:06 riel kernel: scsi bus parity check enabled Aug 3 14:56:06 riel kernel: scsi selection timeout 256 ms Aug 3 14:56:06 riel kernel: scsi bus reset at power-on enabled Aug 3 14:56:06 riel kernel: scsi0 : Adaptec AHA274x/284x (EISA/VL-bus -> Fast SCSI) 1.28/1.11/1.29 Aug 3 14:56:06 riel kernel: scsi : 1 host. Aug 3 14:56:06 riel kernel: aha274x: target 0 now synchronous at 4.0Mb/s Aug 3 14:56:06 riel kernel: Vendor: TOSHIBA Model: CD-ROM XM-3501TA Rev: 3384 Aug 3 14:56:06 riel kernel: Type: CD-ROM ANSI SCSI revision: 02 Aug 3 14:56:06 riel kernel: Detected scsi CD-ROM sr0 at scsi0, id 0, lun 0 Aug 3 14:56:06 riel kernel: aha274x: target 3 now synchronous at 5.0Mb/s Aug 3 14:56:06 riel kernel: Vendor: HP Model: HP35470A Rev: T503 Aug 3 14:56:06 riel kernel: Type: Sequential-Access ANSI SCSI revision: 02 Aug 3 14:56:06 riel kernel: Detected scsi tape st0 at scsi0, id 3, lun 0 Aug 3 14:56:06 riel kernel: aha274x: target 6 now synchronous at 10.0Mb/s Aug 3 14:56:06 riel kernel: Vendor: CONNER Model: CFP1080S Rev: 3939 Aug 3 14:56:06 riel kernel: Type: Direct-Access ANSI SCSI revision: 02 Aug 3 14:56:06 riel kernel: Detected scsi disk sda at scsi0, id 6, lun 0 Aug 3 14:56:06 riel kernel: scsi : detected 1 SCSI tape 1 SCSI cdrom 1 SCSI disk total. Aug 3 14:56:06 riel kernel: SCSI Hardware sector size is 512 bytes on device sda Aug 3 14:56:06 riel kernel: Memory: 15072k/16384k available (604k kernel code, 384k reserved, 324k data) Aug 3 14:56:06 riel kernel: This processor honours the WP bit even when in supervisor mode. Good. Aug 3 14:56:06 riel kernel: Swansea University Computer Society NET3.019 Aug 3 14:56:06 riel kernel: Swansea University Computer Society TCP/IP for NET3.019 Aug 3 14:56:06 riel kernel: IP Protocols: ICMP, UDP, TCP Aug 3 14:56:06 riel kernel: Checking 386/387 coupling... Ok, fpu using exception 16 error reporting. Aug 3 14:56:06 riel kernel: Checking 'hlt' instruction... Ok. Aug 3 14:56:06 riel kernel: Linux version 1.2.13 ([EMAIL PROTECTED]) (gcc version 2.6.3) #2 Thu Jul 11 20:26:36 PDT 1996 Aug 3 14:56:06 riel kernel: Partition check: Aug 3 14:56:06 riel kernel: sda: sda1 sda2 sda3 < sda5 sda6 sda7 > sda4 Aug 3 14:56:06 riel kernel: VFS: Mounted root (ext2 filesystem) readonly. Aug 3 14:56:06 riel kernel: Adding Swap: 32764k swap-space Aug 3 14:56:06 riel kernel: CSLIP: code copyright 1989 Regents of the University of California Aug 3 14:56:06 riel kernel: PPP: version 2.2.0 (dynamic channel allocation) Aug 3 14:56:06 riel kernel: PPP Dynamic channel allocation code copyright 1995 Caldera, Inc. Aug 3 14:56:06 riel kernel: PPP line discipline registered. Aug 3 14:56:06 riel kernel: lp1 at 0x0378, using polling driver Aug 3 15:00:15 riel syslogd: exiting on signal 15 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Any ideas on how to fix this? Thanks in advance. -------------------------------------------------------------------- Danny Heap, UCSF, 3333 California St., Room 102, SF CA, 94122 [EMAIL PROTECTED], voice: (415) 476-8910, fax: (415) 476-1508 --------------------------------------------------------------------