So...

After disabling hyperthreading in the server bios we can no longer reproduce the bug. This is a HP Proliant DL360 G7. We've not yet tried to consistently reproduce on other hardware we have, but all production hosting servers we run now are of these G7 type with Intel Xeon X5650 or X5675 cpu.

The last test Frank reported, starting >20 of vms and in a loop rebooting them consistently reproduced the broken behaviour on a random network interface of a random vm.

We're now at >150 reboot cycles of all test vm's on a single dom0 with HT disabled. When enabling HT this fails consistently between 10-50 vm reboots.

So this still looks like some race condition bug. It's reproducible while running with either openvswitch or linux bridge, so it's only related to the actual dom0/domU passthrough of network traffic.

We're still gonna try to upgrade the test dom0 to wheezy with xen 4.1 and linux 3.2 and see what happens with HT enabled and disabled.

I'm very interested in feedback from Xen/Linux developers about this. Enabling HT with Xen 4.0 and Linux 2.6.32 on debian is not safe right now. Been there, done that. :-)

During the past months, it seemed we're the only ones affected by this bug. Or, no-one is starting, stopping and live migrating xen vms as much as we do?

Thanks,
Hans


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to