Hello, On Thu, Dec 04, 2014 at 12:42:02PM -0500, Sebastian Parschauer wrote: > we've been analyzing a kernel bug in blk-mq with the same kernel version > where we triggered an Oops by hot-unplugging a qcow2 Qemu/KVM virtio-blk > storage device during active I/O to that device within the virtual machine > running this kernel. > So we've installed linux-image-3.16-0.bpo.3-amd64-dbg (version > 3.16.5-1~bpo70+1), > gdb (version 7.4.1+dfsg-0.1) and installed the related source code. But when > trying to list the functions from the call trace, wrong code locations are > displayed. > > # apt-get update > # apt-get install gdb ctags vim apt-src linux-image-3.16-0.bpo.3-amd64-dbg > # cd /usr/src > # apt-src update > # apt-src install linux-image-3.16-0.bpo.3-amd64 > > # dpkg -l | grep linux-image > ii linux-image-3.16-0.bpo.3-amd64 3.16.5-1~bpo70+1 > amd64 Linux 3.16 for 64-bit PCs > ii linux-image-3.16-0.bpo.3-amd64-dbg 3.16.5-1~bpo70+1 > amd64 Debugging symbols for Linux 3.16-0.bpo.3-amd64 > ii linux-image-3.2.0-4-amd64 3.2.63-2+deb7u1 > amd64 Linux 3.2 for 64-bit PCs > ii linux-image-amd64 3.2+46 > amd64 Linux for 64-bit PCs (meta-package) > > # apt-src list linux-image-3.16-0.bpo.3-amd64 > i linux 3.16.5-1~bpo70 /usr/src/linux-3.16.5 > > Oops call trace: > [ 81.248004] Call Trace: > [ 81.248004] [<ffffffff81545f7b>] ? mutex_lock+0x1b/0x2a > [ 81.248004] [<ffffffff812a75c4>] ? blk_mq_free_queue+0x24/0x150 > [ 81.248004] [<ffffffff8129e7c8>] ? blk_release_queue+0x88/0xd0 > [ 81.248004] [<ffffffff812ca160>] ? kobject_cleanup+0x80/0x1d0 > [ 81.248004] [<ffffffff812abba2>] ? disk_release+0x92/0xd0 > [ 81.248004] [<ffffffff813c4f3b>] ? device_release+0x3b/0xb0 > [ 81.248004] [<ffffffff812ca160>] ? kobject_cleanup+0x80/0x1d0 > [ 81.248004] [<ffffffff811f2095>] ? __blkdev_put+0x115/0x1a0 > [ 81.248004] [<ffffffff811f2285>] ? blkdev_close+0x25/0x30 > [ 81.248004] [<ffffffff811bd323>] ? __fput+0xb3/0x210 > [ 81.257437] [<ffffffff8108c164>] ? task_work_run+0xc4/0xe0 > [ 81.257437] [<ffffffff8106f310>] ? do_exit+0x2c0/0xa80 > [ 81.257437] [<ffffffff8106fb56>] ? do_group_exit+0x46/0xb0 > [ 81.257437] [<ffffffff8106fbd7>] ? SyS_exit_group+0x17/0x20 > [ 81.257437] [<ffffffff8154792d>] ? system_call_fast_compare_end+0x10/0x15 > [ 81.257437] Code: 55 53 48 89 fb 48 83 ec 20 65 48 8b 04 25 48 c8 00 00 48 > 8b 80 38 c0 ff ff a8 08 75 29 48 8b 57 18 b8 01 00 00 00 48 85 d2 74 03 <8b> > 42 28 85 c0 74 14 4c 8d 6b 20 4c 89 ef e8 0e eb b$ > [ 81.258715] RIP [<ffffffff81545ddf>] __mutex_lock_slowpath+0x3f/0x1c0 > > Let's run gdb: > > # gdb /usr/lib/debug/vmlinux-3.16-0.bpo.3-amd64 > (gdb) list *blk_mq_free_queue+0x24 > > 96 /build/linux-LrLd2z/linux-3.16.5/include/linux/list.h: No such file > or directory. > > (gdb) quit > # mkdir -p /build/linux-LrLd2z > # ln -sT /usr/src/linux-3.16.5/ /build/linux-LrLd2z/linux-3.16.5 > # gdb /usr/lib/debug/vmlinux-3.16-0.bpo.3-amd64 > (gdb) list *blk_mq_free_queue+0x24 > > 0xffffffff812a75c4 is in blk_mq_free_queue > (/build/linux-LrLd2z/linux-3.16.5/include/linux/list.h:101). > 96 * in an undefined state. > 97 */ > 98 #ifndef CONFIG_DEBUG_LIST > 99 static inline void __list_del_entry(struct list_head *entry) > 100 { > 101 __list_del(entry->prev, entry->next); > 102 } > 103 > 104 static inline void list_del(struct list_head *entry) > 105 { > > Can't be possible! There is no mutex_lock() here! I don't know x86, but on arm the stack trace contains the return addresses, so they are pointing to the instruction after the branch.
Looking at blk_mq_free_queue (in v3.18-rc6 because I have that lying around here): static void blk_mq_del_queue_tag_set(struct request_queue *q) { struct blk_mq_tag_set *set = q->tag_set; mutex_lock(&set->tag_list_lock); list_del_init(&q->tag_set_list); [...] void blk_mq_free_queue(struct request_queue *q) { struct blk_mq_tag_set *set = q->tag_set; blk_mq_del_queue_tag_set(q); blk_mq_exit_hw_queues(q, set, set->nr_hw_queues); So I bet you have to look at blk_mq_free_queue+0x1f. > * (gdb) list *blk_release_queue+0x88 > > 0xffffffff8129e7c8 is in blk_release_queue > (/build/linux-LrLd2z/linux-3.16.5/block/blk-sysfs.c:523). > 518 __blk_queue_free_tags(q); > 519 > 520 if (q->mq_ops) > 521 blk_mq_free_queue(q); > 522 > 523 kfree(q->flush_rq); > 524 > 525 blk_trace_shutdown(q); > 526 > 527 bdi_destroy(&q->backing_dev_info); > > This points to kfree() - also wrong! > > Let's check the disassembly! > > # objdump -D /usr/lib/debug/vmlinux-3.16-0.bpo.3-amd64 | less > (less) /<blk_mq_free_queue>: > ffffffff812a75a0 <blk_mq_free_queue>: > ffffffff812a75a0: e8 1b 28 2a 00 callq ffffffff81549dc0 > <__fentry__> > ffffffff812a75a5: 41 54 push %r12 > ffffffff812a75a7: 55 push %rbp > ffffffff812a75a8: 53 push %rbx > ffffffff812a75a9: 48 8b af a8 07 00 00 mov 0x7a8(%rdi),%rbp > ffffffff812a75b0: 48 89 fb mov %rdi,%rbx > ffffffff812a75b3: e8 78 f1 ff ff callq ffffffff812a6730 > <blk_mq_freeze_queue> > ffffffff812a75b8: 4c 8d 65 38 lea 0x38(%rbp),%r12 > ffffffff812a75bc: 4c 89 e7 mov %r12,%rdi > ffffffff812a75bf: e8 9c e9 29 00 callq ffffffff81545f60 > <mutex_lock> > ffffffff812a75c4: 48 8b 8b b0 07 00 00 mov 0x7b0(%rbx),%rcx > > 0xa0 + 0x24 = 0xc4 > > Here is definitely a call to mutex_lock() at blk_mq_free_queue+0x24 !!! s/at/before/ ! > No call to any list stuff! So just the source code information in vmlinux is > wrong! > It's the same with the other code locations. > > Please fix that in your package build! So I don't think there is anything to fix. Do you concur? Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ | -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org