Hello Dave, It looks like we identified the problem.
We are working on fix and will send it as soon as it is ready. ~Dmitry. Sent from my iPhone > On 15 May 2017, at 12:22, Dr. David Alan Gilbert <[email protected]> wrote: > > * Dmitry Fleytman ([email protected]) wrote: >> Hello Dave, > > Hi Dmitry, > Thanks for the reply. > >> We are trying to reproduce this issue on our systems but with no luck so far… > > Note our QE hit this with both a Win8.1 and a win2012r2 guest - although > the 2012r2 is reported to have recoverd after a few minutes. > 2016 apparently works OK. > >> From what you describe it looks like some bit in ICR is not being cleared by >> the driver. >> This usually means that this bit should never be set in that specific >> interrupt mode. >> >> Could you please check which bit is not cleared and who sets it? > > The full set of e1000e_irq_pending_interrupts after migration is: > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x100000 > (ICR: 0x80100082, IMS: 0x1f00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x80100082, IMS: 0x1e00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x80100082, IMS: 0x1e00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x200000 > (ICR: 0x80300082, IMS: 0x1e00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x80100082, IMS: 0x1c00004) > <repeated lots> > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x80300082, IMS: 0x1c00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 > (ICR: 0x813000c2, IMS: 0x1c00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 > (ICR: 0x813000c2, IMS: 0x1400004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 > (ICR: 0x813000c2, IMS: 0x1000004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x813000c2, IMS: 0x4) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x813000c2, IMS: 0x4) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x813000c2, IMS: 0x4) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x813000c2, IMS: 0x4) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x813000c2, IMS: 0x4) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x200000 > (ICR: 0x813000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x811000c2, IMS: 0x4) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x811000c2, IMS: 0x4) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x811000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x811000c2, IMS: 0xa00004) > <repeats> > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x811000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x200000 > (ICR: 0x813000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x811000c2, IMS: 0x800004) > <repeats> > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x813000c2, IMS: 0x800004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x813000c2, IMS: 0x4) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x200000 > (ICR: 0x813000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x811000c2, IMS: 0x4) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x811000c2, IMS: 0x4) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x811000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x811000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x815000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x815000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x815000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 > (ICR: 0x815000c2, IMS: 0x1a00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x815000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x815000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x815000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x815000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x815000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: > 0x815000c2, IMS: 0xa00004) > [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 > (ICR: 0x815000c2, IMS: 0x1a00004) > > and then I think we get stuck in this cycle of this one always being the > one that fires repeatedly. I think that's the 'other' firing, I think > because of the receive-overrun. One thing I've not > figured out is why the receive overrun happens - is that because we > really have a very heavy packet rate or is it because something has > stopped receiving them. > The network I'm testing on does have a fair amount of broadcast traffic > on. > > Dave > >> Regards, >> Dmitry >> >>> On 11 May 2017, at 15:36 PM, Dr. David Alan Gilbert <[email protected]> >>> wrote: >>> >>> Hi Dmitry, >>> Have you seen any problems with e1000e migration under windows? >>> I've got a repeatable case where after migration with e1000e windows >>> hangs/almost hangs. >>> I'm seeing the e1000e generate interrupts at a very very high >>> rate (maybe ~1000 second ish?) after migration. >>> >>> Some versions of qemu do it and some dont, but my attempts >>> at bisection lead me to code that should be irrelevant. >>> >>> Prior to migration I see: >>> >>> [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x100000 >>> (ICR: 0x80100082, IMS: 0x1f00004) >>> [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 >>> (ICR: 0x80000082, IMS: 0x1a00004) >>> [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 >>> (ICR: 0x80000082, IMS: 0x1f00004) >>> [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 >>> (ICR: 0x80000082, IMS: 0x1a00004) >>> [email protected]:e1000e_irq_pending_interrupts ICR PENDING: 0x0 >>> (ICR: 0x80000082, IMS: 0x1f00004) >>> >>> which I think the ICR means: >>> 31 - int asserted >>> 20 - RxQ0 - receive queue 0 interrupt >>> 7 - RXT0 - receiver timer interrupt >>> 1 - TXQE - Transmit Queue empty >>> >>> after migration it varies more, I'm seeing mostly: >>> [email protected]:e1000e_irq_pending_interrupts ICR PENDING: >>> 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004) >>> 31 - int asserted >>> 24 - 'Other' >>> 22 - TxQ0 interrupt >>> 20 - RxQ0 interrupt >>> 07 - RXT0 Receiver timer interrupt >>> 06 - RX0 - Receiver overrun >>> 01 - TXQE - Transmit queue empty >>> >>> For reference this is https://bugzilla.redhat.com/show_bug.cgi?id=1447935 >>> >>> Dave >>> -- >>> Dr. David Alan Gilbert / [email protected] / Manchester, UK >> > -- > Dr. David Alan Gilbert / [email protected] / Manchester, UK
