Bug#542250: repeatable crashes while copying 500G from NFS mount to local logical volume

Nikita V. Youshchenko Thu, 20 Aug 2009 07:51:31 -0700

> There are three spinning-state structures (per CPU) here, allowing for
> spinning in process context, in bottom-half (softirq) context, and in
> hard-interrupt context.


There is an interrupt flag (called IRQF_DISABLED in 2.6.26, it was 
SA_NODELAY at some point in the past AFAIR) that controls if irq handler 
is called immediately with interrupts disabled, or a bit later with this 
irq masked and interrupts enabled.

Most interrupt handlers don't set this flag, so they are called with 
interrupts enabled.

If already spinning, softrq context can't be entered other way that from 
after hardirq processing. From this POV, spinlock taken in irq handler and 
spinlock taken in softrq handler are equivalent.

And I've just checked, in_interrupt() actually checks for both.

> Since interrupts are generally enabled during 
> interrupt handlers, there need to be multiple levels of hard-interrupt
> contexts (one per IRQ), if I'm not mistaken.  The code in SLE 11 and XCI
> appears to implement this.

I've just looked into code.

In Xen, hard interrupts are delivered to kernel not by hardware, but by Xen 
hypervisor through Xen event channels. At code level, that corresponds to 
(async) call to evtchn_do_upcall(). But that routine has explicit 
protection against double-entry on the same cpu, see the code for details. 

That means, do_IRQ can't be called on the same CPU before previous 
invocation returns. So multiple levels of "hard-interrupt contexts" just 
can't exist.

Then even 2 spinning-state structures per cpu should be enough. No need to 
have more.

And then my fix looks correct. It still leaves some overkill (3rd 
spinning-state), but it should reliably avoid the original crash.

Nikita

P.S.
Btw, server still works without crashes :)



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#542250: repeatable crashes while copying 500G from NFS mount to local logical volume

Reply via email to