Hi.

John Baldwin wrote:
> On Friday 04 June 2010 2:30:13 pm Alexander Motin wrote:
>> I am working on driver for HPET event timers. It works mostly fine,
>> except after some cases when ioapic_assign_cpu() called while timer is
>> active. Under interrupt rate of 10KHz it is enough a dozen cpuset runs
>> to break it (with 1KHz - few dozens). When it happens, I can see that
>> timer is still running, interrupt status register is changing, but no
>> interrupts received.
>>
>> Can somebody explain this behavior and propose some solution? Have
>> somebody seen it for regular PCI devices?
> 
> It probably would be best to disable the source, however, you can't just re-
> enable it as it might already be disabled when you move it.  It is also 
> probably a bug that io_masked can be read w/o holding the icu_lock in 
> ioapic_program_intpin().  I think the icu_lock should be pushed up to callers 
> of ioapic_program_intpin(), and that you should explicitly do a simple write 
> to mask the source a bit earlier.  Something like this perhaps:
> 
> Index: io_apic.c
> ===================================================================
> --- io_apic.c (revision 208714)
> +++ io_apic.c (working copy)
>
> If you find that you still need the DELAY(10), you could place it in the 
> conditional block that masks the interrupt perhaps.

Thank you!

This patch seems to reduce stuck probability for level-triggered
interrupts, especially when printf works as DELAY() on bootverbose=1.
Without DELAY() and bootverbose I can trigger the issue by few dozens of
spuset runs. With DELAY() I still was able to do it twice, but only
after several hundreds of tries.

Also I have found that unconditional pre-disabling interrupt hurts RTC
timer. As I understand, IRQ8 is edge-triggered, and it's loss during
switch makes RTC timer stuck. I've tried to not mask edge-triggered
interrupts - and it helped. But even so I am able to hang RTC timer with
hundred cpusets, so may be we have to just choose between bad and worse.

Updated patch attached.

PS: For the global timers CPU binding is probably not so important, as
interrupts are any way redistributed to other CPUs. So I have found that
this issue can be workarounded by binding interrupts to BSP during
attach to avoid migration on SMP start.

-- 
Alexander Motin
--- io_apic.c.prev      2010-06-05 00:53:55.000000000 +0300
+++ io_apic.c   2010-06-08 09:21:56.000000000 +0300
@@ -261,16 +261,15 @@ ioapic_program_intpin(struct ioapic_ints
         * If a pin is completely invalid or if it is valid but hasn't
         * been enabled yet, just ensure that the pin is masked.
         */
+       mtx_assert(&icu_lock, MA_OWNED);
        if (intpin->io_irq == IRQ_DISABLED || (intpin->io_irq < NUM_IO_INTS &&
            intpin->io_vector == 0)) {
-               mtx_lock_spin(&icu_lock);
                low = ioapic_read(io->io_addr,
                    IOAPIC_REDTBL_LO(intpin->io_intpin));
                if ((low & IOART_INTMASK) == IOART_INTMCLR)
                        ioapic_write(io->io_addr,
                            IOAPIC_REDTBL_LO(intpin->io_intpin),
                            low | IOART_INTMSET);
-               mtx_unlock_spin(&icu_lock);
                return;
        }
 
@@ -312,14 +311,12 @@ ioapic_program_intpin(struct ioapic_ints
        }
 
        /* Write the values to the APIC. */
-       mtx_lock_spin(&icu_lock);
        intpin->io_lowreg = low;
        ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin), low);
        value = ioapic_read(io->io_addr, IOAPIC_REDTBL_HI(intpin->io_intpin));
        value &= ~IOART_DEST;
        value |= high;
        ioapic_write(io->io_addr, IOAPIC_REDTBL_HI(intpin->io_intpin), value);
-       mtx_unlock_spin(&icu_lock);
 }
 
 static int
@@ -352,6 +349,14 @@ ioapic_assign_cpu(struct intsrc *isrc, u
        if (new_vector == 0)
                return (ENOSPC);
 
+       /* Mask the old intpin if it is enabled while it is migrated. */
+       mtx_lock_spin(&icu_lock);
+       if (!intpin->io_masked && !intpin->io_edgetrigger) {
+               ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin),
+                   intpin->io_lowreg | IOART_INTMSET);
+               DELAY(100);
+       }
+
        intpin->io_cpu = apic_id;
        intpin->io_vector = new_vector;
        if (isrc->is_handlers > 0)
@@ -364,6 +369,8 @@ ioapic_assign_cpu(struct intsrc *isrc, u
                    intpin->io_vector);
        }
        ioapic_program_intpin(intpin);
+       mtx_unlock_spin(&icu_lock);
+
        /*
         * Free the old vector after the new one is established.  This is done
         * to prevent races where we could miss an interrupt.
@@ -399,9 +406,11 @@ ioapic_disable_intr(struct intsrc *isrc)
                /* Mask this interrupt pin and free its APIC vector. */
                vector = intpin->io_vector;
                apic_disable_vector(intpin->io_cpu, vector);
+               mtx_lock_spin(&icu_lock);
                intpin->io_masked = 1;
                intpin->io_vector = 0;
                ioapic_program_intpin(intpin);
+               mtx_unlock_spin(&icu_lock);
                apic_free_vector(intpin->io_cpu, vector, intpin->io_irq);
        }
 }
@@ -443,6 +452,7 @@ ioapic_config_intr(struct intsrc *isrc, 
         * XXX: Should we write to the ELCR if the trigger mode changes for
         * an EISA IRQ or an ISA IRQ with the ELCR present?
         */
+       mtx_lock_spin(&icu_lock);
        if (intpin->io_bus == APIC_BUS_EISA)
                pol = INTR_POLARITY_HIGH;
        changed = 0;
@@ -464,6 +474,7 @@ ioapic_config_intr(struct intsrc *isrc, 
        }
        if (changed)
                ioapic_program_intpin(intpin);
+       mtx_unlock_spin(&icu_lock);
        return (0);
 }
 
@@ -473,8 +484,10 @@ ioapic_resume(struct pic *pic)
        struct ioapic *io = (struct ioapic *)pic;
        int i;
 
+       mtx_lock_spin(&icu_lock);
        for (i = 0; i < io->io_numintr; i++)
                ioapic_program_intpin(&io->io_pins[i]);
+       mtx_unlock_spin(&icu_lock);
 }
 
 /*
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to