Hi,
I experience the same behaviour on 8 servers but with different hardware :
* NEC Express5800/R120a-2 [N8100-1501F]
* 8 or 16G of Ram ECC REG
* Processor: Intel Xeon E5503 or E5504. Single or dual processor.
* Network controller: Ethernet controller: Intel Corporation 82576
Gigabit Network Connection (rev 01)
* 3ware Inc 9650SE SATA-II RAID PCIe --or-- software raid (md)
* FS: ext3
* Disks: Mostly WD, Caviar blue, black or VelociRaptor series.
* Virtual machine's disks are stored in LVMs.
Upgraded from :
* lenny -> squeeze -> wheezy -> jessie
OR
* squeeze -> wheezy -> jessie
Upgrades where done in a single shot. Xen packages where not cleaned up
before starting the virtual machines.
Cleaning-up old xen-s does not seems to help.
Package firmware-linux-free is installed on all servers.
I reinstalled a server in wheezy from scratch on one server, and jessie
on another to see if it helps. VMs configurations and disks (lvm) are kept.
We also have DELLs (R420 and R620) upgraded from Wheezy to Jessie, but
doesn't seems to be affected.
The error captured using netconsole (scsi error does not seems relevant) :
[60083.367483] 3w-9xxx: scsi0: ERROR: (0x03:0x0101): Invalid command
opcode:opcode=0x85.
[60083.368945] 3w-9xxx: scsi0: ERROR: (0x03:0x0101): Invalid command
opcode:opcode=0x85.
[60083.409478] 3w-9xxx: scsi0: ERROR: (0x03:0x0101): Invalid command
opcode:opcode=0x85.
[62299.128350] ------------[ cut here ]------------
[62299.128760] WARNING: CPU: 0 PID: 0 at
/build/linux-QZaPpC/linux-3.16.7-ckt11/net/sched/sch_generic.c:264
dev_watchdog+0x236/0x240
()
[62299.128810] NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out
[62299.128839] Modules linked in: netconsole xt_tcpudp xt_physdev
iptable_filter ip_tables x_tables xen_netback xen_blkback xen_gntd
ev binfmt_misc xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd fscache sunrpc bridge 8021q garp stp mrp
llc bonding psmouse ttm drm_kms_helper drm coretemp evdev pcspkr
serio_raw i2c_i801 ipmi_si lpc_ich mfd_core ipmi_msghandler tpm_ti
s tpm button ioatdma processor i7core_edac edac_core thermal_sys shpchp
configfs loop autofs4 ext4 crc16 mbcache jbd2 dm_mod sg sd_m
od crc_t10dif crct10dif_generic crct10dif_common hid_generic usbhid hid
crc32c_intel ahci uhci_hcd ehci_pci libahci ehci_hcd libata
igb usbcore i2c_algo_bit 3w_9xxx usb_common i2c_core dca ptp scsi_mod
pps_core [last unloaded: netconsole]
[62299.129637] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64
#1 Debian 3.16.7-ckt11-1
[62299.129703] Hardware name: NEC NEC Express5800/R120a-2
[N8100-1501F]/MS-9197-01S, BIOS 1.0.1C10 03/24/2009
[62299.129799] 0000000000000009 ffffffff8150b405 ffff88001f203e28
ffffffff81067797
[62299.129906] 0000000000000002 ffff88001f203e78 0000000000000008
0000000000000000
[62299.130014] ffff88000226e000 ffffffff810677fc ffffffff81777fb8
0000000000000030
[62299.130121] Call Trace:
[62299.130166] <IRQ> [<ffffffff8150b405>] ? dump_stack+0x41/0x51
[62299.130244] [<ffffffff81067797>] ? warn_slowpath_common+0x77/0x90
[62299.130302] [<ffffffff810677fc>] ? warn_slowpath_fmt+0x4c/0x50
[62299.130364] [<ffffffff8100a0c1>] ? xen_timer_interrupt+0x111/0x150
[62299.130425] [<ffffffff8143eb96>] ? dev_watchdog+0x236/0x240
[62299.130482] [<ffffffff8143e960>] ? dev_graft_qdisc+0x70/0x70
[62299.130541] [<ffffffff81072ae1>] ? call_timer_fn+0x31/0x100
[62299.130597] [<ffffffff8143e960>] ? dev_graft_qdisc+0x70/0x70
[62299.130654] [<ffffffff81074119>] ? run_timer_softirq+0x209/0x2f0
[62299.130712] [<ffffffff8106c641>] ? __do_softirq+0xf1/0x290
[62299.130768] [<ffffffff8106ca15>] ? irq_exit+0x95/0xa0
[62299.130901] [<ffffffff81358495>] ? xen_evtchn_do_upcall+0x35/0x50
[62299.130964] [<ffffffff8151325e>] ? xen_do_hypervisor_callback+0x1e/0x30
[62299.131022] <EOI> [<ffffffff810013aa>] ?
xen_hypercall_sched_op+0xa/0x20
[62299.131094] [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[62299.131153] [<ffffffff81009e0c>] ? xen_safe_halt+0xc/0x20
[62299.131211] [<ffffffff8101c999>] ? default_idle+0x19/0xb0
[62299.131270] [<ffffffff810a7ff0>] ? cpu_startup_entry+0x340/0x400
[62299.131329] [<ffffffff81903071>] ? start_kernel+0x492/0x49d
[62299.131385] [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
[62299.131442] [<ffffffff81904f64>] ? xen_start_kernel+0x569/0x573
[62299.131498] ---[ end trace 93cc57d7dca442f8 ]---
[62299.133025] igb 0000:01:00.0 eth0: Reset adapter
[62299.176937] bonding: bond0: making interface eth1 the new active one
[62299.181054] device eth0 left promiscuous mode
[62299.181320] device eth1 entered promiscuous mode
[62302.120815] igb 0000:01:00.0 eth0: Reset adapter
[62305.261445] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps
Full Duplex, Flow Control: RX
[62305.280904] bonding: bond0: link status definitely up for interface
eth0, 1000 Mbps full duplex
[62310.121470] igb 0000:01:00.1 eth1: Reset adapter
[62310.185794] bonding: bond0: making interface eth0 the new active one
[62310.189887] device eth1 left promiscuous mode
[62310.190138] device eth0 entered promiscuous mode
[62313.121750] igb 0000:01:00.1 eth1: Reset adapter
[62316.342434] igb 0000:01:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps
Full Duplex, Flow Control: RX
[62316.389986] bonding: bond0: link status definitely up for interface
eth1, 1000 Mbps full duplex
[62340.124133] igb 0000:01:00.0 eth0: Reset adapter
[62340.124295] igb 0000:01:00.0: Detected Tx Unit Hang
[62340.124295] Tx Queue <2>
[62340.124295] TDH <eb>
[62340.124295] TDT <eb>
[62340.124295] next_to_use <eb>
[62340.124295] next_to_clean <e2>
[62340.124295] buffer_info[next_to_clean]
[62340.124295] time_stamp <100ec9490>
[62340.124295] next_to_watch <ffff8800108c7e20>
[62340.124295] jiffies <100ec9e0c>
[62340.124295] desc.status <d8001>
[62340.201532] bonding: bond0: making interface eth1 the new active one
[62340.205626] device eth0 left promiscuous mode
[62340.205875] device eth1 entered promiscuous mode
[62343.124467] igb 0000:01:00.0 eth0: Reset adapter
[62346.245186] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps
Full Duplex, Flow Control: RX
[62346.304754] bonding: bond0: link status definitely up for interface
eth0, 1000 Mbps full duplex
[62356.125561] igb 0000:01:00.1 eth1: Reset adapter
[62356.210009] bonding: bond0: making interface eth0 the new active one
[62356.214118] device eth1 left promiscuous mode
[62356.214361] device eth0 entered promiscuous mode
[62359.125973] igb 0000:01:00.1 eth1: Reset adapter
[62360.117969] sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed
out, resetting card.
[62362.414672] bonding: bond0: link status definitely up for interface
eth1, 1000 Mbps full duplex
[62364.122257] igb 0000:01:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps
Full Duplex, Flow Control: RX
[62381.087874] igb 0000:01:00.0 eth0: Reset adapter
[62381.220302] bonding: bond0: making interface eth1 the new active one
[62381.224405] device eth0 left promiscuous mode
[62381.224645] device eth1 entered promiscuous mode
[62384.080265] igb 0000:01:00.0 eth0: Reset adapter
[62387.324875] bonding: bond0: link status definitely up for interface
eth0, 1000 Mbps full duplex
[62389.076598] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps
Full Duplex, Flow Control: RX
[62397.861349] sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed
out, resetting card.
[62414.130842] igb 0000:01:00.1 eth1: Reset adapter
[62414.131017] igb 0000:01:00.1: Detected Tx Unit Hang
[62414.131017] Tx Queue <2>
[62414.131017] TDH <ec>
[62414.131017] TDT <ec>
[62414.131017] next_to_use <ec>
[62414.131017] next_to_clean <d8>
[62414.131017] buffer_info[next_to_clean]
[62414.131017] time_stamp <100ecdd8c>
[62414.131017] next_to_watch <ffff8800108d9da0>
[62414.131017] jiffies <100ece650>
[62414.131017] desc.status <1>
[62414.235238] bonding: bond0: making interface eth0 the new active one
[62414.239283] device eth1 left promiscuous mode
[62414.239526] device eth0 entered promiscuous mode
[62417.131239] igb 0000:01:00.1 eth1: Reset adapter
[62420.439830] bonding: bond0: link status definitely up for interface
eth1, 1000 Mbps full duplex
[62422.127613] igb 0000:01:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps
Full Duplex, Flow Control: RX
[62425.491896] sd 0:0:0:0: Device offlined - not ready after error recovery
[62425.491980] sd 0:0:0:0: Device offlined - not ready after error recovery
[62425.492039] sd 0:0:0:0: Device offlined - not ready after error recovery
[62425.492096] sd 0:0:0:0: Device offlined - not ready after error recovery
[62425.495879] sd 0:0:0:0: rejecting I/O to offline device
[62425.495946] sd 0:0:0:0: [sda] killing request
[62425.496004] sd 0:0:0:0: rejecting I/O to offline device
[62425.496057] sd 0:0:0:0: [sda] killing request
[62425.496107] sd 0:0:0:0: rejecting I/O to offline device
[62425.496159] sd 0:0:0:0: [sda] killing request
[62425.496209] sd 0:0:0:0: rejecting I/O to offline device
[62425.496261] sd 0:0:0:0: [sda] killing request
[62425.496315] sd 0:0:0:0: rejecting I/O to offline device
[62425.496429] sd 0:0:0:0: [sda] killing request
[62425.496482] sd 0:0:0:0: rejecting I/O to offline device
[62425.496486] sd 0:0:0:0: [sda] Unhandled error code
[62425.496488] sd 0:0:0:0: [sda]
[62425.496490] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[62425.496491] sd 0:0:0:0: [sda] CDB:
[62425.496495] Write(10): 2a 00 2b ad d0 60 00 00 08 00
[62425.496497] end_request: I/O error, dev sda, sector 732811360
[62425.496831] sd 0:0:0:0: [sda] killing request
[...]
Loaded modules :
Module Size Used by
netconsole 13318 0
configfs 31664 2 netconsole
xt_tcpudp 12527 6
xt_physdev 12468 26
iptable_filter 12536 1
ip_tables 26011 1 iptable_filter
x_tables 27111 4
xt_physdev,ip_tables,xt_tcpudp,iptable_filter
xen_netback 43986 7
xen_blkback 34328 0
binfmt_misc 16949 1
xen_gntdev 17032 2
xen_evtchn 12783 8
xenfs 12687 1
xen_privcmd 12868 17 xenfs
bridge 106102 0
8021q 27844 0
garp 13117 1 8021q
stp 12437 2 garp,bridge
mrp 17343 1 8021q
llc 12745 3 stp,garp,bridge
bonding 124989 0
psmouse 99249 0
serio_raw 12849 0
ttm 77862 0
drm_kms_helper 49210 0
drm 249955 2 ttm,drm_kms_helper
coretemp 12820 0
pcspkr 12595 0
evdev 17445 6
lpc_ich 20768 0
mfd_core 12601 1 lpc_ich
i2c_i801 16965 0
ipmi_si 48709 0
ipmi_msghandler 39917 1 ipmi_si
tpm_tis 17231 0
tpm 31511 1 tpm_tis
ioatdma 57654 0
button 12944 0
shpchp 31121 0
i7core_edac 22278 0
edac_core 51465 2 i7core_edac
processor 28221 0
thermal_sys 27642 1 processor
loop 26605 0
autofs4 35529 2
hid_generic 12393 0
usbhid 44460 0
hid 102264 2 hid_generic,usbhid
ext4 473802 1
crc16 12343 1 ext4
mbcache 17171 1 ext4
jbd2 82413 1 ext4
dm_mod 89405 35
raid1 34596 1
md_mod 107672 2 raid1
sg 29973 0
sd_mod 44356 8
crc_t10dif 12431 1 sd_mod
crct10dif_generic 12581 1
crct10dif_common 12356 2 crct10dif_generic,crc_t10dif
crc32c_intel 21809 0
ahci 33291 5
libahci 27158 1 ahci
libata 177457 2 ahci,libahci
scsi_mod 191405 3 sg,libata,sd_mod
ehci_pci 12512 0
uhci_hcd 43499 0
ehci_hcd 69837 1 ehci_pci
usbcore 195340 4 uhci_hcd,ehci_hcd,ehci_pci,usbhid
igb 171872 0
usb_common 12440 1 usbcore
i2c_algo_bit 12751 1 igb
i2c_core 46012 5 drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit
dca 13168 2 igb,ioatdma
ptp 17692 1 igb
pps_core 17225 1 ptp
-- Package-specific info:
** Version:
Linux servername 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1
(2015-05-24) x86_64 GNU/Linux
** Command line:
placeholder root=UUID=b1b3521d-d4a0-4fc6-abd7-cd85edf64758 ro quiet
{,splash}
--
Tristan Charbonneau
Domisys
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org