It seems like commit: 22b886dd timers: Use proper base migration in add_timer_on() [1]
Has been started to by apply : $ git tag --contains 22b886dd Ubuntu-lts-4.4.0-4.19_14.04.1 Ubuntu-lts-4.4.0-4.19_14.04.2 [1] $ git show 22b886dd commit 22b886dd1018093920c4250dee2a9a3cb7cff7b8 Author: Tejun Heo <t...@kernel.org> Date: Wed Nov 4 12:15:33 2015 -0500 timers: Use proper base migration in add_timer_on() Regardless of the previous CPU a timer was on, add_timer_on() currently simply sets timer->flags to the new CPU. As the caller must be seeing the timer as idle, this is locally fine, but the timer leaving the old base while unlocked can lead to race conditions as follows. Let's say timer was on cpu 0. cpu 0 cpu 1 ----------------------------------------------------------------------------- del_timer(timer) succeeds del_timer(timer) lock_timer_base(timer) locks cpu_0_base add_timer_on(timer, 1) spin_lock(&cpu_1_base->lock) timer->flags set to cpu_1_base operates on @timer operates on @timer This triggered with mod_delayed_work_on() which contains "if (del_timer()) add_timer_on()" sequence eventually leading to the following oops. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0 ... Workqueue: wqthrash wqthrash_workfunc [wqthrash] task: ffff8800172ca680 ti: ffff8800172d0000 task.ti: ffff8800172d0000 RIP: 0010:[<ffffffff810ca6e9>] [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0 ... Call Trace: [<ffffffff810cb0b4>] del_timer+0x44/0x60 [<ffffffff8106e836>] try_to_grab_pending+0xb6/0x160 [<ffffffff8106e913>] mod_delayed_work_on+0x33/0x80 [<ffffffffa0000081>] wqthrash_workfunc+0x61/0x90 [wqthrash] [<ffffffff8106dba8>] process_one_work+0x1e8/0x650 [<ffffffff8106e05e>] worker_thread+0x4e/0x450 [<ffffffff810746af>] kthread+0xef/0x110 [<ffffffff8185980f>] ret_from_fork+0x3f/0x70 Fix it by updating add_timer_on() to perform proper migration as __mod_timer() does. Reported-and-tested-by: Jeff Layton <jlay...@poochiereds.net> Signed-off-by: Tejun Heo <t...@kernel.org> Cc: Chris Worley <chris.wor...@primarydata.com> Cc: bfie...@fieldses.org Cc: Michael Skralivetsky <michael.skralivet...@primarydata.com> Cc: Trond Myklebust <trond.mykleb...@primarydata.com> Cc: Shaohua Li <s...@fb.com> Cc: Jeff Layton <jlay...@poochiereds.net> Cc: kernel-t...@fb.com Cc: sta...@vger.kernel.org Link: http://lkml.kernel.org/r/20151029103113.2f893...@tlielax.poochiereds.net Link: http://lkml.kernel.org/r/20151104171533.gi5...@mtj.duckdns.org Signed-off-by: Thomas Gleixner <t...@linutronix.de> diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 74591ba..bbc5d11 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -977,13 +977,29 @@ EXPORT_SYMBOL(add_timer); */ void add_timer_on(struct timer_list *timer, int cpu) { - struct tvec_base *base = per_cpu_ptr(&tvec_bases, cpu); + struct tvec_base *new_base = per_cpu_ptr(&tvec_bases, cpu); + struct tvec_base *base; unsigned long flags; timer_stats_timer_set_start_info(timer); BUG_ON(timer_pending(timer) || !timer->function); - spin_lock_irqsave(&base->lock, flags); - timer->flags = (timer->flags & ~TIMER_BASEMASK) | cpu; + + /* + * If @timer was on a different CPU, it should be migrated with the + * old base locked to prevent other operations proceeding with the + * wrong base locked. See lock_timer_base(). + */ + base = lock_timer_base(timer, &flags); + if (base != new_base) { + timer->flags |= TIMER_MIGRATING; + + spin_unlock(&base->lock); + base = new_base; + spin_lock(&base->lock); + WRITE_ONCE(timer->flags, + (timer->flags & ~TIMER_BASEMASK) | cpu); + } + debug_activate(timer, timer->expires); internal_add_timer(base, timer); spin_unlock_irqrestore(&base->lock, flags); -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1546320 Title: crash starting at kernel v3.13.0-72 in timer code Status in linux package in Ubuntu: Confirmed Bug description: Register %RAX is LIST_POISON2. [239837.578526] general protection fault: 0000 [#1] SMP ... [239837.664031] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.13.0-74-generic #118-Ubuntu [239837.672997] Hardware name: XXXXXXXXXXXXXXXXXX [239837.685506] task: ffff881028dc6000 ti: ffff881028dce000 task.ti: ffff881028dce000 [239837.694280] RIP: 0010:[<ffffffff810756a4>] [<ffffffff810756a4>] detach_if_pending+0x34/0xb0 [239837.704179] RSP: 0018:ffff88103fa03d10 EFLAGS: 00010002 [239837.710425] RAX: dead000000200200 RBX: ffffffffa01be040 RCX: 000000000000303e [239837.718778] RDX: ffff8810288906b8 RSI: ffff881028f60000 RDI: ffffffffa01be040 [239837.727137] RBP: ffff88103fa03d30 R08: 0000000000000086 R09: ffff881028f88000 [239837.735505] R10: 0000000000000002 R11: 0000000000000005 R12: ffffffffa01be040 [239837.760360] R13: ffff881028f60000 R14: 0000000000000001 R15: 0000000000000001 [239837.785862] FS: 0000000000000000(0000) GS:ffff88103fa00000(0000) knlGS:0000000000000000 [239837.812540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [239837.827508] CR2: 00000000033d4048 CR3: 0000000001c0e000 CR4: 00000000001407e0 [239837.852880] Stack: [239837.863639] ffffffffa01be040 0000000000000000 ffff881028f60000 ffff882025639a00 [239837.889101] ffff88103fa03d60 ffffffff81075766 0000000000000086 ffffffffa01be020 [239837.914247] ffff88103fa03d98 0000000000000100 ffff88103fa03d88 ffffffff81082369 [239837.939532] Call Trace: [239837.950648] <IRQ> [239837.952982] [239837.963021] [<ffffffff81075766>] del_timer+0x46/0x70 [239837.974969] [<ffffffff81082369>] try_to_grab_pending+0xa9/0x160 [239837.989674] [<ffffffff81082453>] mod_delayed_work_on+0x33/0x70 [239838.003709] [<ffffffffa01bb3ba>] set_timeout+0x3a/0x40 [ib_addr] [239838.018469] [<ffffffffa01bb559>] netevent_callback+0x29/0x30 [ib_addr] [239838.033727] [<ffffffff8173125c>] notifier_call_chain+0x4c/0x70 [239838.047561] [<ffffffff81634a60>] ? neigh_table_clear+0x120/0x120 [239838.062010] [<ffffffff817312ba>] atomic_notifier_call_chain+0x1a/0x20 [239838.076485] [<ffffffff8163100b>] call_netevent_notifiers+0x1b/0x20 [239838.090371] [<ffffffff81634b21>] neigh_timer_handler+0xc1/0x2c0 [239838.104354] [<ffffffff810745d6>] call_timer_fn+0x36/0x100 [239838.117021] [<ffffffff81634a60>] ? neigh_table_clear+0x120/0x120 [239838.131002] [<ffffffff8107556f>] run_timer_softirq+0x1ef/0x2f0 [239838.143985] [<ffffffff8106cd2c>] __do_softirq+0xec/0x2c0 [239838.156386] [<ffffffff8106d275>] irq_exit+0x105/0x110 [239838.168325] [<ffffffff81737b15>] smp_apic_timer_interrupt+0x45/0x60 [239838.181501] [<ffffffff8173649d>] apic_timer_interrupt+0x6d/0x80 [239838.193978] <EOI> [239838.196317] [239838.203483] [<ffffffff815d65b2>] ? cpuidle_enter_state+0x52/0xc0 [239838.214553] [<ffffffff815d66d9>] cpuidle_idle_call+0xb9/0x1f0 [239838.226799] [<ffffffff8101d3ee>] arch_cpu_idle+0xe/0x30 [239838.238745] [<ffffffff810bf475>] cpu_startup_entry+0xc5/0x290 [239838.250792] [<ffffffff810415ed>] start_secondary+0x21d/0x2d0 [239838.263165] Code: 89 e5 41 56 41 89 d6 41 55 41 54 49 89 fc 53 48 8b 17 48 85 d2 74 55 49 89 f5 0f 1f 44 00 00 49 8b 44 24 08 45 84 f6 48 89 42 08 <48> 89 10 74 08 49 c7 04 24 00 00 00 00 41 f6 44 24 18 01 48 b8 [239838.301935] RIP [<ffffffff810756a4>] detach_if_pending+0x34/0xb0 [239838.314036] RSP <ffff88103fa03d10> To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1546320/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp