Hi Eric,
I'm hitting the following splat (attached in full):
kernel: =========================
kernel: [ BUG: held lock freed! ]
kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
kernel: -------------------------
kernel: swapper/5/0 is freeing memory
ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
kernel: 4 locks held by swapper/5/0:
kernel: #0: (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
netif_receive_skb_internal+0x4b/0x1f0
kernel: #1: (rcu_read_lock){......}, at: [<ffffffff816e977f>]
ip_local_deliver_finish+0x3f/0x380
kernel: #2: (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
sk_clone_lock+0x19b/0x440
kernel: #3: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
The first couple of occurrences were on 4.4-rc2, a couple of recent ones
on 4.5-rc1. I'm suspecting your ebb516af60e1 ("tcp/dccp: fix race at
listener dismantle phase") and/or a bunch of related commits before it,
authored in the first half of Oct 2015.
The spinlock that lockdep complains about is embedded in
inet_connection_sock, which I think is getting freed in
inet_child_forget(), called from inet_csk_reqsk_queue_add():
inet_csk_complete_hashdance
inet_csk_reqsk_queue_add
spin_lock(&queue->rskq_lock)
inet_child_forget
reqsk_put
reqsk_free
sock_put(req->rsk_listener)
# rsk_listener is sk that queue is embedded in #
Your comment in inet_csk_complete_hashdance()
/* Warning: caller must not call reqsk_put(req);
* child stole last reference on it.
*/
suggests that there is only one reference to req at that point in time
and yet ebb516af60e1 added a reqsk_put() to that path: there was one in
inet_csk_listen_stop() before, but not in inet_csk_reqsk_queue_add().
This is from a quick and dirty git blame analysis, so don't shout if
all of it is wrong...
Thanks,
Ilya
2016-02-05T12:25:33.849429+00:00 smithi013 kernel: =========================
2016-02-05T12:25:33.849462+00:00 smithi013 kernel: [ BUG: held lock freed! ]
2016-02-05T12:25:33.849486+00:00 smithi013 kernel:
4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
2016-02-05T12:25:33.849510+00:00 smithi013 kernel: -------------------------
2016-02-05T12:25:33.849546+00:00 smithi013 kernel: swapper/5/0 is freeing
memory ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
2016-02-05T12:25:33.849570+00:00 smithi013 kernel:
(&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>]
inet_csk_reqsk_queue_add+0x28/0xa0
2016-02-05T12:25:33.849593+00:00 smithi013 kernel: 4 locks held by swapper/5/0:
2016-02-05T12:25:33.849617+00:00 smithi013 kernel: #0:
(rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
netif_receive_skb_internal+0x4b/0x1f0
2016-02-05T12:25:33.849639+00:00 smithi013 kernel: #1:
(rcu_read_lock){......}, at: [<ffffffff816e977f>]
ip_local_deliver_finish+0x3f/0x380
2016-02-05T12:25:33.849662+00:00 smithi013 kernel: #2:
(slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>] sk_clone_lock+0x19b/0x440
2016-02-05T12:25:33.849685+00:00 smithi013 kernel: #3:
(&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>]
inet_csk_reqsk_queue_add+0x28/0xa0
2016-02-05T12:25:33.849709+00:00 smithi013 kernel: #012stack backtrace:
2016-02-05T12:25:33.849733+00:00 smithi013 kernel: CPU: 5 PID: 0 Comm:
swapper/5 Not tainted 4.5.0-rc1-ceph-00026-g5e0a311 #1
2016-02-05T12:25:33.849760+00:00 smithi013 kernel: Hardware name: Supermicro
SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015
2016-02-05T12:25:33.849783+00:00 smithi013 kernel: 0000000000000000
954bbb66c6a89089 ffff88087fd438d0 ffffffff813f7b27
2016-02-05T12:25:33.849806+00:00 smithi013 kernel: ffff88085b9aafc8
ffff88087fd43908 ffffffff810fe02c ffff880035c9d200
2016-02-05T12:25:33.849828+00:00 smithi013 kernel: ffff88085b776800
ffffea0000d72600 ffffffff81685c51 0000000000000202
2016-02-05T12:25:33.849851+00:00 smithi013 kernel: Call Trace:
2016-02-05T12:25:33.849873+00:00 smithi013 kernel: <IRQ> [<ffffffff813f7b27>]
dump_stack+0x4b/0x64
2016-02-05T12:25:33.849896+00:00 smithi013 kernel: [<ffffffff810fe02c>]
debug_check_no_locks_freed+0x11c/0x180
2016-02-05T12:25:33.849918+00:00 smithi013 kernel: [<ffffffff81685c51>] ?
sk_destruct+0x141/0x170
2016-02-05T12:25:33.849947+00:00 smithi013 kernel: [<ffffffff81232ed5>]
kmem_cache_free+0x105/0x2d0
2016-02-05T12:25:33.849973+00:00 smithi013 kernel: [<ffffffff81685c51>]
sk_destruct+0x141/0x170
2016-02-05T12:25:33.849995+00:00 smithi013 kernel: [<ffffffff81685cef>]
__sk_free+0x6f/0xb0
2016-02-05T12:25:33.850018+00:00 smithi013 kernel: [<ffffffff81685d48>]
sk_free+0x18/0x20
2016-02-05T12:25:33.850045+00:00 smithi013 kernel: [<ffffffff816f6a4d>]
inet_child_forget+0x13d/0x150
2016-02-05T12:25:33.850067+00:00 smithi013 kernel: [<ffffffff816f6a9e>]
inet_csk_reqsk_queue_add+0x3e/0xa0
2016-02-05T12:25:33.850090+00:00 smithi013 kernel: [<ffffffff816f70ba>]
inet_csk_complete_hashdance+0x4a/0x80
2016-02-05T12:25:33.850113+00:00 smithi013 kernel: [<ffffffff81712074>]
tcp_check_req+0x374/0x450
2016-02-05T12:25:33.850139+00:00 smithi013 kernel: [<ffffffff81710bf4>]
tcp_v4_rcv+0x3f4/0xac0
2016-02-05T12:25:33.850161+00:00 smithi013 kernel: [<ffffffff810fb7ee>] ?
__lock_is_held+0x3e/0x60
2016-02-05T12:25:33.850184+00:00 smithi013 kernel: [<ffffffff816e9909>]
ip_local_deliver_finish+0x1c9/0x380
2016-02-05T12:25:33.850206+00:00 smithi013 kernel: [<ffffffff816e977f>] ?
ip_local_deliver_finish+0x3f/0x380
2016-02-05T12:25:33.850244+00:00 smithi013 kernel: [<ffffffff816e9cdf>]
ip_local_deliver+0x6f/0xe0
2016-02-05T12:25:33.850267+00:00 smithi013 kernel: [<ffffffff810fb702>] ?
match_held_lock+0xf2/0x1a0
2016-02-05T12:25:33.850290+00:00 smithi013 kernel: [<ffffffff816e9740>] ?
ip_rcv_finish+0x4f0/0x4f0
2016-02-05T12:25:33.850313+00:00 smithi013 kernel: [<ffffffff816e9624>]
ip_rcv_finish+0x3d4/0x4f0
2016-02-05T12:25:33.850335+00:00 smithi013 kernel: [<ffffffff816ea05f>]
ip_rcv+0x30f/0x3c0
2016-02-05T12:25:33.850358+00:00 smithi013 kernel: [<ffffffff817bc807>] ?
packet_rcv+0x387/0x3c0
2016-02-05T12:25:33.850385+00:00 smithi013 kernel: [<ffffffff8169df86>]
__netif_receive_skb_core+0x966/0xa30
2016-02-05T12:25:33.850408+00:00 smithi013 kernel: [<ffffffff8169ef6b>] ?
netif_receive_skb_internal+0x4b/0x1f0
2016-02-05T12:25:33.850431+00:00 smithi013 kernel: [<ffffffff8169e09e>]
__netif_receive_skb+0x4e/0x60
2016-02-05T12:25:33.850453+00:00 smithi013 kernel: [<ffffffff8169f04d>]
netif_receive_skb_internal+0x12d/0x1f0
2016-02-05T12:25:33.850479+00:00 smithi013 kernel: [<ffffffff8169ef6b>] ?
netif_receive_skb_internal+0x4b/0x1f0
2016-02-05T12:25:33.850502+00:00 smithi013 kernel: [<ffffffff816a078c>]
napi_gro_receive+0xfc/0x180
2016-02-05T12:25:33.850525+00:00 smithi013 kernel: [<ffffffffa00c2891>]
ixgbe_clean_rx_irq+0x841/0x9b0 [ixgbe]
2016-02-05T12:25:33.850548+00:00 smithi013 kernel: [<ffffffffa00c373b>]
ixgbe_poll+0x50b/0x710 [ixgbe]
2016-02-05T12:25:33.850578+00:00 smithi013 kernel: [<ffffffff814ec7dc>] ?
add_interrupt_randomness+0x19c/0x1c0
2016-02-05T12:25:33.850604+00:00 smithi013 kernel: [<ffffffff8169f9d1>]
net_rx_action+0x141/0x3e0
2016-02-05T12:25:33.850627+00:00 smithi013 kernel: [<ffffffff810abd9a>]
__do_softirq+0x1ea/0x450
2016-02-05T12:25:33.850650+00:00 smithi013 kernel: [<ffffffff810ac2a6>]
irq_exit+0x56/0x100
2016-02-05T12:25:33.850672+00:00 smithi013 kernel: [<ffffffff817dc28f>]
do_IRQ+0xef/0x110
2016-02-05T12:25:33.850695+00:00 smithi013 kernel: [<ffffffff817d9fd6>]
common_interrupt+0x96/0x96
2016-02-05T12:25:33.850717+00:00 smithi013 kernel: <EOI> [<ffffffff811ed3c1>]
? quiet_vmstat+0x31/0x60
2016-02-05T12:25:33.850740+00:00 smithi013 kernel: [<ffffffff810f37f0>]
cpu_startup_entry+0x90/0x400
2016-02-05T12:25:33.850763+00:00 smithi013 kernel: [<ffffffff81055262>]
start_secondary+0x182/0x1a0