I have been trying to solve this problem without success for about a year
now. 

My hardware is:

- Dell NetPlex 486SX/25
- 32MB RAM
- AHA1542C SCSI Controller
- 3COM 509B Ethernet Controller
- 2-port 16550A Serial Card
- 3 x SCSI Hard Drives External
- 1 x SCSI Jaz Drive External
- 1 x SCSI Sony DDS-3 DAT Drive External

My software is:

- Debian GNU/Linux 2.1 (slink)
- Glibc 2.1 added from potato
- Linux 2.2.11 kernel

My problem is:

After the system has been up for a random length of time (usually about a
week or so) it will crash in the middle of the night during a full backup
to the DAT drive using cpio. The machine hangs in either an infinite loop
or a kernel panic. I originally was running Debian 2.1 with a 2.0.36
kernel, and I would see the following scrolling endlessly off the screen
after a crash:

Sending SCSI DID_RESET...
Sending SCSI DID_RESET...
Sending SCSI DID_RESET...
Sending SCSI DID_RESET...
Sending SCSI DID_RESET...
other scsi messages, etc...

Since installing the 2.2 kernel and associated upgraded packages as
detailed in the errata for slink, the crashes *seem* to occur less often,
but this morning I saw:

aha1542_out failed...
aha1542_out failed... failed to reset target...
...
Kernel panic: unable to find empty mailbox for aha1542...

and the system was locked up. Since upgrading to the 2.2 kernel, I also
notice periodic messages in the syslog (about one per day) like this:

aha1542.c: interrupt received but no mail

The system will run perfectly for a week or so, doing this same backup
routine every night, and then it will just pull this trick on some random
night.

I have tried:

- disconnecting all devices except the tape drive hard drives
- installing the highest quality cables I can find for the external
  devices (this machine currently has about $400 US worth of Granite
  Digital cables hanging off of it).
- installing a Granite Digital active terminator on the end of the SCSI
  chain
- verifying that there are no interrupt or IO port confilicts both in the
  device jumper configurations and from the /proc filesystem

I am completely at my wits end with this. I have searched DejaNews
repeatedly for any discussions of kernel panics and crashes with Adaptec
cards, Linux, SCSI in general, etc., and all I can find is one thread
from about a year ago mentioning the same sorts of problems but no
solution.

Is this a problem that anyone else has ever had with Linux and an
AHA1542C in particular or SCSI in general? Can anyone recommend which
part of the setup I should change or eliminate? Is it a bad card? Are
Adaptec cards bad in general? Is the aha1542 scsi driver problematic? Is
Linux SCSI in general problematic? 

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Reply via email to