On Tuesday July 18, [EMAIL PROTECTED] wrote:
>
>
> On Tue, 18 Jul 2000, Neil Brown wrote:
> >
> > I managed to reproduce this and, at least for me, it is caused by a
> > deadlock when kflushd tries to write out data via raid1, raid1 tries
> > to allocate memory, which blocks waiting for kflushd to free up some
> > memory.
>
> Hmm.. This is actually what "GFP_BUFFER" was meant for: GFP_BUFFER is not
> atomic, but it will not block for IO.
>
> So for example, GFP_BUFFER can still walk the page tables and the LRU
> lists (because it's not called from an interrupt context), but it will
> drop only pages that don't need IO to be dropped.
>
> That is, of course, unless GFP_BUFFER has had bit-rot. It's simple enough
> that I don't think it has, and I'd love to hear if your deadlock goes away
> using GFP_BUFFER instead of GFP_ATOMIC, which would be the right thing to
> do..
>
> Linus
Yep. GFP_BUFFER seems to work fine.
I also traced through the code to find out exactly where __GFP_IO made
a difference, and it turned out to coincide exactly with the
stack-trace I had of a deadlocked kflushd:
Stack[0]: <__wait_on_buffer+210>
Stack[1]: <sync_page_buffers+48>
Stack[2]: <try_to_free_buffers+389>
Stack[3]: <shrink_mmap+190>
Stack[4]: <do_try_to_free_pages+142>
Stack[5]: <try_to_free_pages+38>
Stack[6]: <__alloc_pages+299>
Stack[7]: <kmem_cache_grow+241>
Stack[8]: <kmalloc+205>
You learn something new every day (if you try).
NeilBrown
--- drivers/block/raid1.c 2000/07/18 02:00:54 1.1
+++ drivers/block/raid1.c 2000/07/18 02:44:39 1.2
@@ -75,7 +75,7 @@
md_spin_unlock_irq(&conf->device_lock);
if (cnt == 0)
break;
- t = (struct buffer_head *)kmalloc(sizeof(struct buffer_head),
GFP_KERNEL);
+ t = (struct buffer_head *)kmalloc(sizeof(struct buffer_head),
+GFP_BUFFER);
if (t) {
memset(t, 0, sizeof(*t));
t->b_next = bh;
@@ -165,7 +165,7 @@
if (r1_bh)
return r1_bh;
r1_bh = (struct raid1_bh *) kmalloc(sizeof(struct raid1_bh),
- GFP_KERNEL);
+ GFP_BUFFER);
if (r1_bh) {
memset(r1_bh, 0, sizeof(*r1_bh));
return r1_bh;