Package: linux-2.6 Version: 2.6.32-21 Severity: normal Hello, I have been experiencing random bridge failures with Xen domU's.
My environment is all Debian squeeze, Xen 4.0.1~rc6, PV (not HVM), generic network setup (network-bridge/vif-bridge scripts). Randomly, maybe after 10 to 60 minutes of uptime, a domU or two will fall victim to bridge failure. There's no syslog/dmesg output. The only report of the problem can by seen through network stats on dom0 (the domU vifX.X interfaces have huge TX drops), and 'brctl showmacs' output is missing the MAC addresses for the domU's that have failed. The issue has been identifed and fixed in this xen-devel mailing list thread: http://thread.gmane.org/gmane.comp.emulators.xen.devel/88590 I applied Dongxiao Xu's changes to drivers/net/xen-netfront.c, taken from Jeremy Fitzhardinge's git repository, to the linux-2.6 package, tested and has proven to be stable for the last few days. I have attached this patch to this bug report. BTW the following data reportbug collected about the kernel probably isn't very interesting, though it is a domU with original/unpatched Debian kernel. -- Package-specific info: ** Version: Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-21) (b...@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-2) ) #1 SMP Wed Aug 25 16:02:22 UTC 2010 ** Command line: root=/dev/xvda1 ro ** Not tainted ** Kernel log: [ 0.024015] alloc kstat_irqs on node 0 [ 0.024019] alloc irq_desc for 534 on node 0 [ 0.024020] alloc kstat_irqs on node 0 [ 0.004000] Initializing CPU#1 [ 0.004000] CPU: L1 I cache: 32K, L1 D cache: 32K [ 0.004000] CPU: L2 cache: 256K [ 0.004000] CPU: L3 cache: 8192K [ 0.004000] CPU 1/0x6 -> Node 0 [ 0.004000] CPU: Unsupported number of siblings 16 [ 0.024252] Brought up 2 CPUs [ 0.024328] CPU0 attaching sched-domain: [ 0.024333] domain 0: span 0-1 level CPU [ 0.024337] groups: 0 1 [ 0.024346] CPU1 attaching sched-domain: [ 0.024349] domain 0: span 0-1 level CPU [ 0.024352] groups: 1 0 [ 0.024495] devtmpfs: initialized [ 0.028697] Grant table initialized [ 0.028697] regulator: core version 0.5 [ 0.028697] NET: Registered protocol family 16 [ 0.028697] alloc irq_desc for 533 on node 0 [ 0.028697] alloc kstat_irqs on node 0 [ 0.028717] PCI: setting up Xen PCI frontend stub [ 0.029288] bio: create slab <bio-0> at 0 [ 0.029288] ACPI: Interpreter disabled. [ 0.029288] xen_balloon: Initialising balloon driver with page order 0. [ 0.029288] vgaarb: loaded [ 0.029288] PCI: System does not support PCI [ 0.029288] PCI: System does not support PCI [ 0.029288] Switching to clocksource xen [ 0.029538] pnp: PnP ACPI: disabled [ 0.030125] NET: Registered protocol family 2 [ 0.030243] IP route cache hash table entries: 2048 (order: 2, 16384 bytes) [ 0.030601] TCP established hash table entries: 8192 (order: 5, 131072 bytes) [ 0.030672] TCP bind hash table entries: 8192 (order: 5, 131072 bytes) [ 0.030704] TCP: Hash tables configured (established 8192 bind 8192) [ 0.030711] TCP reno registered [ 0.030781] NET: Registered protocol family 1 [ 0.030840] Unpacking initramfs... [ 0.034064] Freeing initrd memory: 4904k freed [ 0.037392] platform rtc_cmos: registered platform RTC device (no PNP device found) [ 0.037743] audit: initializing netlink socket (disabled) [ 0.037762] type=2000 audit(1284332221.698:1): initialized [ 0.042365] HugeTLB registered 2 MB page size, pre-allocated 0 pages [ 0.044493] VFS: Disk quotas dquot_6.5.2 [ 0.044566] Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.044665] msgmni has been set to 488 [ 0.045230] alg: No test for stdrng (krng) [ 0.045363] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253) [ 0.045378] io scheduler noop registered [ 0.045388] io scheduler anticipatory registered [ 0.045397] io scheduler deadline registered [ 0.045447] io scheduler cfq registered (default) [ 0.055871] registering netback [ 0.057776] alloc irq_desc for 532 on node 0 [ 0.057781] alloc kstat_irqs on node 0 [ 0.058167] Linux agpgart interface v0.103 [ 0.058211] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.058466] input: Macintosh mouse button emulation as /devices/virtual/input/input0 [ 0.058528] PNP: No PS/2 controller found. Probing ports directly. [ 0.059385] i8042.c: No controller found. [ 0.059509] mice: PS/2 mouse device common for all mice [ 0.059672] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0 [ 0.059744] cpuidle: using governor ladder [ 0.059750] cpuidle: using governor menu [ 0.059759] No iBFT detected. [ 0.060177] TCP cubic registered [ 0.060389] NET: Registered protocol family 10 [ 0.061099] lo: Disabled Privacy Extensions [ 0.061473] Mobile IPv6 [ 0.061480] NET: Registered protocol family 17 [ 0.061622] PM: Resume from disk failed. [ 0.061637] registered taskstats version 1 [ 0.064007] XENBUS: Device with no driver: device/vbd/51713 [ 0.064007] XENBUS: Device with no driver: device/vbd/51714 [ 0.064007] XENBUS: Device with no driver: device/vif/0 [ 0.064007] XENBUS: Device with no driver: device/console/0 [ 0.064007] /build/buildd-linux-2.6_2.6.32-21-amd64-bEMv9E/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/rtc/hctosys.c: unable to open rtc device (rtc0) [ 0.064007] Initalizing network drop monitor service [ 0.153824] Freeing unused kernel memory: 592k freed [ 0.153971] Write protecting the kernel read-only data: 4320k [ 0.200844] udev: starting version 160 [ 0.242532] alloc irq_desc for 531 on node 0 [ 0.242535] alloc kstat_irqs on node 0 [ 0.249938] alloc irq_desc for 530 on node 0 [ 0.249941] alloc kstat_irqs on node 0 [ 0.262922] blkfront: xvda1: barriers enabled [ 0.282053] blkfront: xvda2: barriers enabled [ 0.531739] kjournald starting. Commit interval 5 seconds [ 0.531769] EXT3-fs: mounted filesystem with ordered data mode. [ 0.764289] udev: starting version 160 [ 0.855268] Initialising Xen virtual ethernet driver. [ 0.856665] alloc irq_desc for 529 on node 0 [ 0.856667] alloc kstat_irqs on node 0 [ 0.881754] input: PC Speaker as /devices/platform/pcspkr/input/input1 [ 0.886677] Error: Driver 'pcspkr' is already registered, aborting... [ 1.073785] Adding 1048568k swap on /dev/xvda2. Priority:-1 extents:1 across:1048568k SS [ 1.129525] EXT3 FS on xvda1, internal journal [ 2.229159] ip_tables: (C) 2000-2006 Netfilter Core Team [ 12.048118] eth0: no IPv6 routers present ** Model information not available ** Loaded modules: Module Size Used by xt_multiport 2267 1 iptable_filter 2258 1 ip_tables 13899 1 iptable_filter x_tables 12845 2 xt_multiport,ip_tables snd_pcsp 6579 0 snd_pcm 60519 1 snd_pcsp snd_timer 15582 1 snd_pcm evdev 7352 0 xen_netfront 16073 0 snd 46446 3 snd_pcsp,snd_pcm,snd_timer soundcore 4598 1 snd snd_page_alloc 6249 1 snd_pcm ext3 106502 1 jbd 37085 1 ext3 mbcache 5050 1 ext3 xen_blkfront 9435 2 ** Network interface configuration: auto lo iface lo inet loopback auto eth0 iface eth0 inet static address 192.168.1.30 netmask 255.255.255.0 gateway 192.168.1.1 ** Network status: *** IP interfaces and addresses: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:16:3e:00:00:0d brd ff:ff:ff:ff:ff:ff inet 192.168.1.30/24 brd 192.168.1.255 scope global eth0 inet6 fe80::216:3eff:fe00:d/64 scope link valid_lft forever preferred_lft forever *** Device statistics: Inter-| Receive | Transmit face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed lo: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 eth0: 1286892 677 0 0 0 0 0 0 96982 560 0 0 0 0 0 0 *** Protocol statistics: Ip: 651 total packets received 0 forwarded 0 incoming packets discarded 618 incoming packets delivered 544 requests sent out Icmp: 0 ICMP messages received 0 input ICMP message failed. ICMP input histogram: 0 ICMP messages sent 0 ICMP messages failed ICMP output histogram: Tcp: 13 active connections openings 2 passive connection openings 0 failed connection attempts 0 connection resets received 6 connections established 587 segments received 515 segments send out 0 segments retransmited 0 bad segments received. 2 resets sent Udp: 29 packets received 0 packets to unknown port received. 0 packet receive errors 29 packets sent UdpLite: TcpExt: 4 TCP sockets finished time wait in fast timer 21 delayed acks sent 170 packets directly queued to recvmsg prequeue. 899 bytes directly received in process context from prequeue 233 packet headers predicted 56 packets header predicted and directly queued to user 77 acknowledgments not containing data payload received 148 predicted acknowledgments IpExt: InBcastPkts: 2 InOctets: 1286054 OutOctets: 88478 InBcastOctets: 463 *** Device features: eth0: 0x50003 lo: 0x13865 ** PCI devices: not available ** USB devices: not available -- System Information: Debian Release: squeeze/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 2.6.32-5-xen-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages linux-image-2.6.32-5-xen-amd64 depends on: ii debconf [debconf-2.0] 1.5.35 Debian configuration management sy ii initramfs-tools 0.98.2 tools for generating an initramfs ii linux-base 2.6.32-21 Linux image base package ii module-init-tools 3.12-1 tools for managing Linux kernel mo Versions of packages linux-image-2.6.32-5-xen-amd64 recommends: pn firmware-linux-free <none> (no description available) Versions of packages linux-image-2.6.32-5-xen-amd64 suggests: pn grub <none> (no description available) pn linux-doc-2.6.32 <none> (no description available) Versions of packages linux-image-2.6.32-5-xen-amd64 is related to: pn firmware-bnx2 <none> (no description available) pn firmware-bnx2x <none> (no description available) pn firmware-ipw2x00 <none> (no description available) pn firmware-ivtv <none> (no description available) pn firmware-iwlwifi <none> (no description available) pn firmware-linux <none> (no description available) pn firmware-linux-nonfree <none> (no description available) pn firmware-qlogic <none> (no description available) pn firmware-ralink <none> (no description available) pn xen-hypervisor <none> (no description available) -- debconf information: linux-image-2.6.32-5-xen-amd64/postinst/depmod-error-initrd-2.6.32-5-xen-amd64: false linux-image-2.6.32-5-xen-amd64/postinst/ignoring-do-bootloader-2.6.32-5-xen-amd64: linux-image-2.6.32-5-xen-amd64/prerm/removing-running-kernel-2.6.32-5-xen-amd64: true linux-image-2.6.32-5-xen-amd64/postinst/missing-firmware-2.6.32-5-xen-amd64: -- Gerald Turner Email: gtur...@unzane.com JID: gtur...@jabber.unzane.com GPG: 0xFA8CD6D5 21D9 B2E8 7FE7 F19E 5F7D 4D0C 3FA0 810F FA8C D6D5
diff -aNur linux-2.6-2.6.32.orig/debian/patches/features/all/xen/netfront-smartpoll-param.patch linux-2.6-2.6.32/debian/patches/features/all/xen/netfront-smartpoll-param.patch --- linux-2.6-2.6.32.orig/debian/patches/features/all/xen/netfront-smartpoll-param.patch 1969-12-31 16:00:00.000000000 -0800 +++ linux-2.6-2.6.32/debian/patches/features/all/xen/netfront-smartpoll-param.patch 2010-09-11 10:26:16.000000000 -0700 @@ -0,0 +1,101 @@ +$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git +$ git checkout xen/netfront +$ git diff 5473680bdedb7a62e641970119e6e9381a8d80f4..3b966565a89659f938a4fd662c8475f0c00e0606 + +diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c +index e894dd2..23b9e4d 100644 +--- a/drivers/net/xen-netfront.c ++++ b/drivers/net/xen-netfront.c +@@ -53,6 +53,10 @@ + + static const struct ethtool_ops xennet_ethtool_ops; + ++static int use_smartpoll = 1; ++module_param(use_smartpoll, int, 0600); ++MODULE_PARM_DESC (use_smartpoll, "Use smartpoll mechanism if available"); ++ + struct netfront_cb { + struct page *page; + unsigned offset; +@@ -77,8 +81,8 @@ struct netfront_smart_poll { + + #define GRANT_INVALID_REF 0 + +-#define NET_TX_RING_SIZE __RING_SIZE((struct xen_netif_tx_sring *)0, PAGE_SIZE) +-#define NET_RX_RING_SIZE __RING_SIZE((struct xen_netif_rx_sring *)0, PAGE_SIZE) ++#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) ++#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE) + #define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256) + + struct netfront_info { +@@ -1397,10 +1401,15 @@ static irqreturn_t xennet_interrupt(int irq, void *dev_id) + napi_schedule(&np->napi); + } + +- if (np->smart_poll.feature_smart_poll) +- hrtimer_start(&np->smart_poll.timer, +- ktime_set(0, NANO_SECOND/np->smart_poll.smart_poll_freq), +- HRTIMER_MODE_REL); ++ if (np->smart_poll.feature_smart_poll) { ++ if ( hrtimer_start(&np->smart_poll.timer, ++ ktime_set(0,NANO_SECOND/np->smart_poll.smart_poll_freq), ++ HRTIMER_MODE_REL) ) { ++ printk(KERN_DEBUG "Failed to start hrtimer," ++ "use interrupt mode for this packet\n"); ++ np->rx.sring->private.netif.smartpoll_active = 0; ++ } ++ } + + spin_unlock_irqrestore(&np->tx_lock, flags); + +@@ -1538,7 +1547,7 @@ again: + goto abort_transaction; + } + +- err = xenbus_printf(xbt, dev->nodename, "feature-smart-poll", "%d", 1); ++ err = xenbus_printf(xbt, dev->nodename, "feature-smart-poll", "%d", use_smartpoll); + if (err) { + message = "writing feature-smart-poll"; + goto abort_transaction; +@@ -1631,11 +1640,14 @@ static int xennet_connect(struct net_device *dev) + return -ENODEV; + } + +- err = xenbus_scanf(XBT_NIL, np->xbdev->otherend, +- "feature-smart-poll", "%u", +- &np->smart_poll.feature_smart_poll); +- if (err != 1) +- np->smart_poll.feature_smart_poll = 0; ++ np->smart_poll.feature_smart_poll = 0; ++ if (use_smartpoll) { ++ err = xenbus_scanf(XBT_NIL, np->xbdev->otherend, ++ "feature-smart-poll", "%u", ++ &np->smart_poll.feature_smart_poll); ++ if (err != 1) ++ np->smart_poll.feature_smart_poll = 0; ++ } + + if (np->smart_poll.feature_smart_poll) { + hrtimer_init(&np->smart_poll.timer, CLOCK_MONOTONIC, +diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h +index 7b301fa..c9ba846 100644 +--- a/include/xen/interface/io/ring.h ++++ b/include/xen/interface/io/ring.h +@@ -24,8 +24,15 @@ typedef unsigned int RING_IDX; + * A ring contains as many entries as will fit, rounded down to the nearest + * power of two (so we can mask with (size-1) to loop around). + */ +-#define __RING_SIZE(_s, _sz) \ +- (__RD32(((_sz) - (long)&(_s)->ring + (long)(_s)) / sizeof((_s)->ring[0]))) ++#define __CONST_RING_SIZE(_s, _sz) \ ++ (__RD32(((_sz) - offsetof(struct _s##_sring, ring)) / \ ++ sizeof(((struct _s##_sring *)0)->ring[0]))) ++ ++/* ++ * The same for passing in an actual pointer instead of a name tag. ++ */ ++#define __RING_SIZE(_s, _sz) \ ++ (__RD32(((_sz) - (long)&(_s)->ring + (long)(_s)) / sizeof((_s)->ring[0]))) + + /* + * Macros to make the correct C datatypes for a new kind of ring. diff -aNur linux-2.6-2.6.32.orig/debian/patches/series/21-extra linux-2.6-2.6.32/debian/patches/series/21-extra --- linux-2.6-2.6.32.orig/debian/patches/series/21-extra 2010-09-12 15:49:48.000000000 -0700 +++ linux-2.6-2.6.32/debian/patches/series/21-extra 2010-09-11 10:54:10.000000000 -0700 @@ -16,4 +16,5 @@ + features/all/xen/pvhvm/0016-xen-pvhvm-rename-xen_emul_unplug-ignore-to-unnnec.patch featureset=xen + features/all/xen/pvhvm/0017-xen-pvhvm-make-it-clearer-that-XEN_UNPLUG_-define.patch featureset=xen + features/all/xen/pvops.patch featureset=xen ++ features/all/xen/netfront-smartpoll-param.patch featureset=xen + features/all/xen/revert-stack-guard.patch featureset=xen diff -aNur linux-2.6-2.6.32.orig/debian/patches/series/21-extra~ linux-2.6-2.6.32/debian/patches/series/21-extra~ --- linux-2.6-2.6.32.orig/debian/patches/series/21-extra~ 1969-12-31 16:00:00.000000000 -0800 +++ linux-2.6-2.6.32/debian/patches/series/21-extra~ 2010-09-11 10:30:27.000000000 -0700 @@ -0,0 +1,19 @@ ++ features/all/xen/pvhvm/0001-xen-Add-support-for-HVM-hypercalls.patch featureset=xen ++ features/all/xen/pvhvm/0002-x86-early-PV-on-HVM-features-initialization.patch featureset=xen ++ features/all/xen/pvhvm/0003-x86-xen-event-channels-delivery-on-HVM.patch featureset=xen ++ features/all/xen/pvhvm/0004-xen-Xen-PCI-platform-device-driver.patch featureset=xen ++ features/all/xen/pvhvm/0005-xen-Add-suspend-resume-support-for-PV-on-HVM-guests.patch featureset=xen ++ features/all/xen/pvhvm/0006-xen-Fix-find_unbound_irq-in-presence-of-ioapic-irqs.patch featureset=xen ++ features/all/xen/pvhvm/0007-x86-Use-xen_vcpuop_clockevent-xen_clocksource-and.patch featureset=xen ++ features/all/xen/pvhvm/0008-x86-Unplug-emulated-disks-and-nics.patch featureset=xen ++ features/all/xen/pvhvm/0009-x86-Call-HVMOP_pagetable_dying-on-exit_mmap.patch featureset=xen ++ features/all/xen/pvhvm/0010-xenfs-enable-for-HVM-domains-too.patch featureset=xen ++ features/all/xen/pvhvm/0011-support-multiple-.discard.-sections-to-avoid-sectio.patch featureset=xen ++ features/all/xen/pvhvm/0012-blkfront-do-not-create-a-PV-cdrom-device-if-xen_hvm.patch featureset=xen ++ features/all/xen/pvhvm/0013-Introduce-CONFIG_XEN_PVHVM-compile-option.patch featureset=xen ++ features/all/xen/pvhvm/0014-pvops-do-not-notify-callers-from-register_xenstore_.patch featureset=xen ++ features/all/xen/pvhvm/0015-xen-pvhvm-allow-user-to-request-no-emulated-device.patch featureset=xen ++ features/all/xen/pvhvm/0016-xen-pvhvm-rename-xen_emul_unplug-ignore-to-unnnec.patch featureset=xen ++ features/all/xen/pvhvm/0017-xen-pvhvm-make-it-clearer-that-XEN_UNPLUG_-define.patch featureset=xen ++ features/all/xen/pvops.patch featureset=xen ++ features/all/xen/revert-stack-guard.patch featureset=xen