Package: linux-image-2.6.32-5-amd64 Version: 2.6.32-18 Severity: important Tags: patch
I'm running a 2 node cluster that mounts an OCFS2 filesystem on both nodes. The disk containing the filesystem is an iScsi volume hosted on a SAN device. During simultaneous use of the filesystem by both nodes I reproducibly encountered the following BUG(): kernel: [3401206.397280] lockres: O00000000000000000e69950000000, owner=1, state=0 kernel: [3401206.397280] last used: 5139177996, refcnt: 5, on purge list: yes kernel: [3401206.397280] on dirty list: no, on reco list: no, migrating pending: no kernel: [3401206.397280] inflight locks: 0, asts reserved: 1 kernel: [3401206.397280] refmap nodes: [ ], inflight=0 kernel: [3401206.397280] granted queue: kernel: [3401206.397280] type=3, conv=-1, node=1, cookie=1:275866887, ref=3, ast=(empty=n,pend=y), bast=(empty=y,pend=n), pending=(conv=n,lock=n,cancel=n,unlock=n) kernel: [3401206.397280] converting queue: kernel: [3401206.397280] blocked queue: kernel: [3401206.397280] ------------[ cut here ]------------ kernel: [3401206.397280] kernel BUG at fs/ocfs2/dlm/dlmthread.c:169! kernel: [3401206.397280] invalid opcode: 0000 [1] SMP kernel: [3401206.397280] CPU 1 kernel: [3401206.397280] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi scsi_mod evdev ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod thermal_sys kernel: [3401206.397280] Pid: 26270, comm: dlm_thread Not tainted 2.6.26-2-xen-amd64 #1 kernel: [3401206.397280] RIP: e030:[<ffffffffa0146bd3>] [<ffffffffa0146bd3>] :ocfs2_dlm:dlm_run_purge_list+0x148/0x578 kernel: [3401206.397280] RSP: e02b:ffff88000dfafe00 EFLAGS: 00010246 kernel: [3401206.397280] RAX: ffff880007c18700 RBX: ffff880007c18768 RCX: 0000c6c600005794 kernel: [3401206.397280] RDX: 000000000000eeee RSI: 0000000000000001 RDI: ffffffff8059dab0 kernel: [3401206.397280] RBP: ffff880007c186c0 R08: 0000000000000000 R09: 0000000000000001 kernel: [3401206.397280] R10: 0000000000000023 R11: 0000010000000022 R12: 000000013251a20c kernel: [3401206.397280] R13: ffff88002492f400 R14: ffff88002492f428 R15: 0000000000000001 kernel: [3401206.397280] FS: 00007ffff7ee4750(0000) GS:ffffffff8052d080(0000) knlGS:0000000000000000 kernel: [3401206.397280] CS: e033 DS: 0000 ES: 0000 kernel: [3401206.397280] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: [3401206.397280] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 kernel: [3401206.397280] Process dlm_thread (pid: 26270, threadinfo ffff88000dfae000, task ffff8800010a3900) kernel: [3401206.397280] Stack: 000000013251abd0 ffffffff802358d5 ffff8800010a3900 000000002492f400 kernel: [3401206.397280] ffff88002492f46c 00000000000000e8 ffff88002f171080 ffff88002492f400 kernel: [3401206.397280] ffff88002f171128 ffff88000520a338 0000000000000001 ffffffffa014730f kernel: [3401206.397280] Call Trace: kernel: [3401206.397280] [<ffffffff802358d5>] ? process_timeout+0x0/0x5 kernel: [3401206.397280] [<ffffffffa014730f>] ? :ocfs2_dlm:dlm_thread+0x95/0xe82 kernel: [3401206.397280] [<ffffffff80224d35>] ? try_to_wake_up+0x118/0x129 kernel: [3401206.397289] [<ffffffff8023f671>] ? autoremove_wake_function+0x0/0x2e kernel: [3401206.397289] [<ffffffffa014727a>] ? :ocfs2_dlm:dlm_thread+0x0/0xe82 kernel: [3401206.397289] [<ffffffff8023f543>] ? kthread+0x47/0x74 kernel: [3401206.397289] [<ffffffff802283a8>] ? schedule_tail+0x27/0x5c kernel: [3401206.397289] [<ffffffff8020be28>] ? child_rip+0xa/0x12 kernel: [3401206.397289] [<ffffffff8023f4fc>] ? kthread+0x0/0x74 kernel: [3401206.397289] [<ffffffff8020be1e>] ? child_rip+0x0/0x12 kernel: [3401206.397289] kernel: [3401206.397289] kernel: [3401206.397289] Code: d2 89 04 24 31 c0 e8 c3 6a 0e e0 48 89 ef e8 bd ed ff ff fe 03 0f b7 13 38 f2 0f 95 c0 84 c0 74 0a 89 d6 48 89 df e8 2d 30 23 e0 <0f> 0b eb fe 66 8b 95 ca 00 00 00 f6 c2 20 0f 84 b1 00 00 00 48 kernel: [3401206.397289] RIP [<ffffffffa0146bd3>] :ocfs2_dlm:dlm_run_purge_list+0x148/0x578 kernel: [3401206.397289] RSP <ffff88000dfafe00> kernel: [3401206.397294] ---[ end trace 285cd07f988b3d3e ]--- The code that causes the crash is: fs/ocfs2/dlm/dlmthread.c: static int dlm_purge_lockres(struct dlm_ctxt *dlm, struct dlm_lock_resource *res) { int master; int ret = 0; spin_lock(&res->spinlock); if (!__dlm_lockres_unused(res)) { mlog(0, "%s:%.*s: tried to purge but not unused\n", dlm->name, res->lockname.len, res->lockname.name); __dlm_print_one_lock_resource(res); spin_unlock(&res->spinlock); ---> BUG(); } Searching for 'tried to purge but not unused' I found this patch (also attached): http://www.mail-archive.com/ocfs2-de...@oss.oracle.com/msg06018.html It removes the BUG() statement and fixes a race that causes this crash. After applying this patch to both systems I could no longer reproduce the problem. The fix will also be included in mainline, probably 2.6.36. Please include this fix in the debian 'squeeze' kernel. Thanks, Ronald.
This patch fixes two problems in dlm_run_purgelist 1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge the same lockres instead of trying the next lockres. 2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres. spinlock is reacquired but in this window lockres can get reused. This leads to BUG. This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the lockres spinlock protecting it from getting reused. Signed-off-by: Srinivas Eeda <srinivas.e...@oracle.com> Acked-by: Sunil Mushran <sunil.mush...@oracle.com> --- fs/ocfs2/dlm/dlmthread.c | 80 +++++++++++++++++++-------------------------- 1 files changed, 34 insertions(+), 46 deletions(-) diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index 11a6d1f..960dc8d 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -152,45 +152,25 @@ void dlm_lockres_calc_usage(struct dlm_ctxt *dlm, spin_unlock(&dlm->spinlock); } -static int dlm_purge_lockres(struct dlm_ctxt *dlm, +static void dlm_purge_lockres(struct dlm_ctxt *dlm, struct dlm_lock_resource *res) { int master; int ret = 0; - spin_lock(&res->spinlock); - if (!__dlm_lockres_unused(res)) { - mlog(0, "%s:%.*s: tried to purge but not unused\n", - dlm->name, res->lockname.len, res->lockname.name); - __dlm_print_one_lock_resource(res); - spin_unlock(&res->spinlock); - BUG(); - } - - if (res->state & DLM_LOCK_RES_MIGRATING) { - mlog(0, "%s:%.*s: Delay dropref as this lockres is " - "being remastered\n", dlm->name, res->lockname.len, - res->lockname.name); - /* Re-add the lockres to the end of the purge list */ - if (!list_empty(&res->purge)) { - list_del_init(&res->purge); - list_add_tail(&res->purge, &dlm->purge_list); - } - spin_unlock(&res->spinlock); - return 0; - } + assert_spin_locked(&dlm->spinlock); + assert_spin_locked(&res->spinlock); master = (res->owner == dlm->node_num); - if (!master) - res->state |= DLM_LOCK_RES_DROPPING_REF; - spin_unlock(&res->spinlock); mlog(0, "purging lockres %.*s, master = %d\n", res->lockname.len, res->lockname.name, master); if (!master) { + res->state |= DLM_LOCK_RES_DROPPING_REF; /* drop spinlock... retake below */ + spin_unlock(&res->spinlock); spin_unlock(&dlm->spinlock); spin_lock(&res->spinlock); @@ -208,31 +188,35 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm, mlog(0, "%s:%.*s: dlm_deref_lockres returned %d\n", dlm->name, res->lockname.len, res->lockname.name, ret); spin_lock(&dlm->spinlock); + spin_lock(&res->spinlock); } - spin_lock(&res->spinlock); if (!list_empty(&res->purge)) { mlog(0, "removing lockres %.*s:%p from purgelist, " "master = %d\n", res->lockname.len, res->lockname.name, res, master); list_del_init(&res->purge); - spin_unlock(&res->spinlock); dlm_lockres_put(res); dlm->purge_count--; - } else - spin_unlock(&res->spinlock); + } + + if (!__dlm_lockres_unused(res)) { + mlog(ML_ERROR, "found lockres %s:%.*s: in use after deref\n", + dlm->name, res->lockname.len, res->lockname.name); + __dlm_print_one_lock_resource(res); + BUG(); + } __dlm_unhash_lockres(res); /* lockres is not in the hash now. drop the flag and wake up * any processes waiting in dlm_get_lock_resource. */ if (!master) { - spin_lock(&res->spinlock); res->state &= ~DLM_LOCK_RES_DROPPING_REF; spin_unlock(&res->spinlock); wake_up(&res->wq); - } - return 0; + } else + spin_unlock(&res->spinlock); } static void dlm_run_purge_list(struct dlm_ctxt *dlm, @@ -251,17 +235,7 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm, lockres = list_entry(dlm->purge_list.next, struct dlm_lock_resource, purge); - /* Status of the lockres *might* change so double - * check. If the lockres is unused, holding the dlm - * spinlock will prevent people from getting and more - * refs on it -- there's no need to keep the lockres - * spinlock. */ spin_lock(&lockres->spinlock); - unused = __dlm_lockres_unused(lockres); - spin_unlock(&lockres->spinlock); - - if (!unused) - continue; purge_jiffies = lockres->last_used + msecs_to_jiffies(DLM_PURGE_INTERVAL_MS); @@ -273,15 +247,29 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm, * in tail order, we can stop at the first * unpurgable resource -- anyone added after * him will have a greater last_used value */ + spin_unlock(&lockres->spinlock); break; } + /* Status of the lockres *might* change so double + * check. If the lockres is unused, holding the dlm + * spinlock will prevent people from getting and more + * refs on it. */ + unused = __dlm_lockres_unused(lockres); + if (!unused || + (lockres->state & DLM_LOCK_RES_MIGRATING)) { + mlog(0, "lockres %s:%.*s: is in use or " + "being remastered, used %d, state %d\n", + dlm->name, lockres->lockname.len, + lockres->lockname.name, !unused, lockres->state); + list_move_tail(&dlm->purge_list, &lockres->purge); + spin_unlock(&lockres->spinlock); + continue; + } + dlm_lockres_get(lockres); - /* This may drop and reacquire the dlm spinlock if it - * has to do migration. */ - if (dlm_purge_lockres(dlm, lockres)) - BUG(); + dlm_purge_lockres(dlm, lockres); dlm_lockres_put(lockres); -- 1.5.6.5