When running rsnapshot backups from an IBM fibre channel disk system using
LVM2 snapshots to a Promise fibre channel disk system, the qla2xxx driver
causes a system crash and reboot. I'm running Lenny with kernel
2.6.22--3-vserver-amd64 and stock Debian qla2xxx module. I've already
replaced the Qlogic HBA and the Qlogic switch connecting to the storage.
Three other servers with similar hardware running the same Debian version
don't have this problem. These events were logged with the
ql2xextended_error_logging parameter enabled:

Feb  6 13:40:28 hqhost kernel: qla2xxx_eh_abort(0): aborting sp
ffff8101d01aa7c0 from RISC. pid=111928.
Feb  6 13:40:58 hqhost kernel: qla2x00_mailbox_command(0): timeout calling
abort_isp
Feb  6 13:40:58 hqhost kernel: qla2x00_mailbox_command(0): timeout calling
abort_isp
Feb  6 13:40:58 hqhost kernel: qla2xxx 0000:08:01.0: Mailbox command timeout
occured. Issuing ISP abort.
Feb  6 13:40:58 hqhost kernel: qla2xxx 0000:08:01.0: Performing ISP error
recovery - ha= ffff810225a84530.
Feb  6 13:40:58 hqhost kernel: scsi(0): **** Load RISC code ****
Feb  6 13:40:58 hqhost kernel: scsi(0): Verifying Checksum of loaded RISC
code.
Feb  6 13:40:58 hqhost kernel: scsi(0): Checksum OK, start firmware.
Feb  6 13:40:58 hqhost kernel: scsi(0): Issue init firmware.
Feb  6 13:40:59 hqhost kernel: scsi(0): Asynchronous P2P MODE received.
Feb  6 13:40:59 hqhost kernel: scsi(0): Asynchronous LOOP UP (4 Gbps).
Feb  6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: LOOP UP detected (4
Gbps).
Feb  6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE.
Feb  6 13:40:59 hqhost kernel: scsi(0): Port database changed ffff 0006
0000.
Feb  6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE ignored
0000/0004/0600.
Feb  6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE ignored
0000/0007/0b00.
Feb  6 13:40:59 hqhost kernel: scsi(0): F/W Ready - OK
Feb  6 13:40:59 hqhost kernel: scsi(0): fw_state=3 curr time=1001756ca.
Feb  6 13:40:59 hqhost kernel: qla2x00_restart_isp(): Start configure loop,
status = 0
Feb  6 13:40:59 hqhost kernel: scsi(0): Configure loop -- dpc flags
=0x4080048
Feb  6 13:40:59 hqhost kernel: scsi(0): RSCN queue entry[0] = [00/000000].
Feb  6 13:40:59 hqhost kernel: scsi(0): device_resync: rscn overflow.
Feb  6 13:40:59 hqhost kernel: scsi(0): RFT_ID failed, completion status
(280).
Feb  6 13:40:59 hqhost kernel: scsi(0): Register FC-4 TYPE failed.
Feb  6 13:40:59 hqhost kernel: scsi(0): RFF_ID failed, completion status
(280).
Feb  6 13:40:59 hqhost kernel: scsi(0): Register FC-4 Features failed.
Feb  6 13:40:59 hqhost kernel: scsi(0): RNN_ID failed, completion status
(280).
Feb  6 13:40:59 hqhost kernel: scsi(0): Register Node Name failed.
Feb  6 13:40:59 hqhost kernel: scsi(0): GID_PT failed, completion status
(180).
Feb  6 13:40:59 hqhost kernel: scsi(0): GA_NXT failed, rejected request:
Feb  6 13:40:59 hqhost kernel:  0   1   2   3   4   5   6   7   8   9  Ah
Bh  Ch  Dh  Eh  Fh
Feb  6 13:40:59 hqhost kernel:
--------------------------------------------------------------
Feb  6 13:40:59 hqhost kernel: 14  00  00  00  00  10  97  23  02  00  00
00  10  08  00  00
Feb  6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: SNS scan failed --
assuming zero-entry result...
Feb  6 13:40:59 hqhost kernel: scsi(0): fcport-0 - port retry count: 29
remaining
Feb  6 13:40:59 hqhost kernel: scsi(0): fcport-1 - port retry count: 29
remaining
Feb  6 13:40:59 hqhost kernel: scsi(0): fcport-2 - port retry count: 29
remaining
Feb  6 13:40:59 hqhost kernel: qla24xx_fabric_logout(0): failed to complete
IOCB -- completion status (2)  ioparam=0/810031.
Feb  6 13:40:59 hqhost kernel: scsi(0): LOOP READY
Feb  6 13:40:59 hqhost kernel: qla2x00_restart_isp(): Configure loop done,
status = 0x0
Feb  6 13:40:59 hqhost kernel: APIC error on CPU5: 00(40)
Feb  6 13:40:59 hqhost kernel: qla2x00_abort_isp(0): exiting.
Feb  6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): finished
abort_isp
Feb  6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): finished
abort_isp
Feb  6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): **** FAILED.
mbx0=54, mbx1=0, mbx2=2397, cmd=54 ****
Feb  6 13:40:59 hqhost kernel: qla2x00_issue_iocb(0): failed rval 0x100
Feb  6 13:40:59 hqhost kernel: qla2x00_issue_iocb(0): failed rval 0x100
Feb  6 13:40:59 hqhost kernel: qla24xx_abort_command(0): failed to issue
IOCB (100).
Feb  6 13:40:59 hqhost kernel: qla2xxx_eh_abort(0): abort_command mbx
failed.
Feb  6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:0): Abort
command issued -- 0 1b538 2002.
Feb  6 13:41:00 hqhost kernel: scsi(0): fcport-0 - port retry count: 28
remaining
Feb  6 13:41:00 hqhost kernel: scsi(0): fcport-1 - port retry count: 28
remaining
Feb  6 13:41:00 hqhost kernel: scsi(0): fcport-2 - port retry count: 28
remaining
Feb  6 13:41:01 hqhost kernel: scsi(0): fcport-0 - port retry count: 27
remaining
Feb  6 13:41:01 hqhost kernel: scsi(0): fcport-1 - port retry count: 27
remaining
Feb  6 13:41:01 hqhost kernel: scsi(0): fcport-2 - port retry count: 27
remaining
Feb  6 13:41:02 hqhost kernel: scsi(0): fcport-0 - port retry count: 26
remaining
Feb  6 13:41:02 hqhost kernel: scsi(0): fcport-1 - port retry count: 26
remaining
Feb  6 13:41:02 hqhost kernel: scsi(0): fcport-2 - port retry count: 26
remaining
...(25 more port retries)...
Feb  6 13:41:33 hqhost kernel:  rport-0:0-0: blocked FC remote port time
out: removing target and saving binding
Feb  6 13:41:33 hqhost kernel:  rport-0:0-4: blocked FC remote port time
out: removing target and saving binding
Feb  6 13:41:33 hqhost kernel:  rport-0:0-5: blocked FC remote port time
out: removing target and saving binding
Feb  6 13:41:33 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:0): DEVICE
RESET ISSUED.
Feb  6 13:41:33 hqhost kernel: qla2x00_wait_for_hba_online return_status=0


Is this a hardware problem, a kernel problem, or a qlogic driver problem--
or perhaps all three at once? Thanks in advance,
-- 
Daniel Bakken
Systems Administrator

Economic Modeling Specialists Inc
Moscow, Idaho

Reply via email to