*** This bug is a duplicate of bug 1572291 *** https://bugs.launchpad.net/bugs/1572291
------- Comment From geral...@de.ibm.com 2016-09-02 09:22 EDT------- >From the dmesg it looks like this time ext4 page allocation stumbles upon the >doubly freed page first, but it is immediately after the page got corrupted by >the double free (indicated by the WARNING), so this just means that ext4 >happened to be the first to get its fingers on the corrupted page during a >page alloc. It could hit anyone, and we also see later another occurrence >where copy_pte_range() stumbles over another corrupted page (no WARNING before >that because it is a WARN_ONCE). We still need to find the root cause for the double free and the resulting page corruption (count -1), and for that we only have the WARNING trace as reliable hint for a double free. So my analysis from comment #5 is still valid, even though this time genwqe itself is not the one who stumbled over the corrupted page, it was still involved in the double free (anyone can see the corrupted page afterwards, genwqe was just a more likely candidate because it was an active consumer at the time). BTW, instead of "double free" of course a call of dma_free() on previously unmapped addresses would result in the same issue, but a double free is much more likely, e.g. caused by broken error handling with "off by one" or other issues. Speaking of error handling, the "genwqe 0001:00:00.0: [genwqe_map_pages] err: no dma addr daddr=ffffffffffffffff!" messages may be a good starting point to verify the genwqe error handling and the page freeing strategy. Those messages by itself are no problem and even expected given the nature of the test (online/offline and failing rpcit), but of course there is some error handling involved which may have issues that could lead to a double free. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1559194 Title: Bad page state in process genwqe_gunzip pfn:3c275 in the genwqe device driver Status in Release Notes for Ubuntu: Fix Released Status in Ubuntu on IBM z Systems: Fix Released Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Status in linux source package in Yakkety: Fix Released Bug description: == Comment: #0 - Dmitry Gorbachev <dmitry.gorbac...@ru.ibm.com> - 2016-03-17 08:52:41 == An error occurs when running zEDC compression/decompression and hotplugging PCI devices. There was 1G of memory, 2 pci functions and 50 threads of gunzipping enabled. Mar 14 23:59:01 s8330018 kernel: [ 4972.486883] BUG: Bad page state in process genwqe_gunzip pfn:3c275 Mar 14 23:59:01 s8330018 kernel: [ 4972.486888] page:000003d100f09d40 count:-1 mapcount:0 mapping: (null) index:0x0 Mar 14 23:59:01 s8330018 kernel: [ 4972.486891] flags: 0x0() Mar 14 23:59:01 s8330018 kernel: [ 4972.486895] page dumped because: nonzero _count Mar 14 23:59:01 s8330018 kernel: [ 4972.486897] Modules linked in: xt_CHECKSUM(E) iptable_mangle(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) iptable_filter(E) ip_tables(E) x_tables(E) genwqe_card(E) crc_itu_t(E) qeth_l2(E) qeth(E) vmur(E) ccwgroup(E) dm_multipath(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) btrfs(E) zlib_deflate(E) raid10(E) raid456(E) async_memcpy(E) async_raid6_recov(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) linear(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) des_generic(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) zfcp(E) qdio(E) scsi_transport_fc(E) dasd_eckd_mod(E) dasd_mod(E) Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] CPU: 0 PID: 37867 Comm: genwqe_gunzip Tainted: G W E 4.4.0-8-generic #23-Ubuntu Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] 00000000209176f8 0000000020917788 0000000000000002 0000000000000000 Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] 0000000020917828 00000000209177a0 00000000209177a0 0000000000114182 Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] 0000000000000011 000000000092345a 000003d10000000a 000000000000000a Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] 00000000209177e8 0000000020917788 0000000000000000 0000000020914000 Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] 0000000000000000 0000000000114182 0000000020917788 00000000209177e8 Mar 14 23:59:01 s8330018 kernel: [ 4972.486922] Call Trace: Mar 14 23:59:01 s8330018 kernel: [ 4972.486927] ([<000000000011406e>] show_trace+0xf6/0x148) Mar 14 23:59:01 s8330018 kernel: [ 4972.486929] [<0000000000114136>] show_stack+0x76/0xe8 Mar 14 23:59:01 s8330018 kernel: [ 4972.486934] [<0000000000518c26>] dump_stack+0x6e/0x90 Mar 14 23:59:01 s8330018 kernel: [ 4972.486937] [<000000000027c376>] bad_page+0xe6/0x148 Mar 14 23:59:01 s8330018 kernel: [ 4972.486938] [<0000000000280516>] get_page_from_freelist+0x49e/0xba8 Mar 14 23:59:01 s8330018 kernel: [ 4972.486940] [<0000000000280ede>] __alloc_pages_nodemask+0x166/0xb00 Mar 14 23:59:01 s8330018 kernel: [ 4972.486941] [<000000000015635a>] s390_dma_alloc+0x82/0x1a0 Mar 14 23:59:01 s8330018 kernel: [ 4972.486944] [<000003ff805ea142>] __genwqe_alloc_consistent+0x7a/0x90 [genwqe_card] Mar 14 23:59:01 s8330018 kernel: [ 4972.486947] [<000003ff805ea344>] genwqe_alloc_sync_sgl+0x17c/0x2e0 [genwqe_card] Mar 14 23:59:01 s8330018 kernel: [ 4972.486950] [<000003ff805e52da>] do_execute_ddcb+0x1da/0x348 [genwqe_card] Mar 14 23:59:01 s8330018 kernel: [ 4972.486952] [<000003ff805e5964>] genwqe_ioctl+0x51c/0xc20 [genwqe_card] Mar 14 23:59:01 s8330018 kernel: [ 4972.486953] [<00000000003145ee>] do_vfs_ioctl+0x3b6/0x518 Mar 14 23:59:01 s8330018 kernel: [ 4972.486955] [<00000000003147f4>] SyS_ioctl+0xa4/0xb8 Mar 14 23:59:01 s8330018 kernel: [ 4972.486956] [<00000000007ad1be>] system_call+0xd6/0x264 Mar 14 23:59:01 s8330018 kernel: [ 4972.486957] [<000003ffa9df2492>] 0x3ffa9df2492 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-release-notes/+bug/1559194/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp