*** This bug is a duplicate of bug 1572291 ***
    https://bugs.launchpad.net/bugs/1572291

------- Comment From geral...@de.ibm.com 2016-09-02 09:22 EDT-------
>From the dmesg it looks like this time ext4 page allocation stumbles upon the 
>doubly freed page first, but it is immediately after the page got corrupted by 
>the double free (indicated by the WARNING), so this just means that ext4 
>happened to be the first to get its fingers on the corrupted page during a 
>page alloc. It could hit anyone, and we also see later another occurrence 
>where copy_pte_range() stumbles over another corrupted page (no WARNING before 
>that because it is a WARN_ONCE).

We still need to find the root cause for the double free and the
resulting page corruption (count -1), and for that we only have the
WARNING trace as reliable hint for a double free. So my analysis from
comment #5 is still valid, even though this time genwqe itself is not
the one who stumbled over the corrupted page, it was still involved in
the double free (anyone can see the corrupted page afterwards, genwqe
was just a more likely candidate because it was an active consumer at
the time).

BTW, instead of "double free" of course a call of dma_free() on
previously unmapped addresses would result in the same issue, but a
double free is much more likely, e.g. caused by broken error handling
with "off by one" or other issues. Speaking of error handling, the
"genwqe 0001:00:00.0: [genwqe_map_pages] err: no dma addr
daddr=ffffffffffffffff!" messages may be a good starting point to verify
the genwqe error handling and the page freeing strategy. Those messages
by itself are no problem and even expected given the nature of the test
(online/offline and failing rpcit), but of course there is some error
handling involved which may have issues that could lead to a double
free.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1559194

Title:
  Bad page state in process genwqe_gunzip pfn:3c275 in the genwqe device
  driver

Status in Release Notes for Ubuntu:
  Fix Released
Status in Ubuntu on IBM z Systems:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  == Comment: #0 - Dmitry Gorbachev <dmitry.gorbac...@ru.ibm.com> - 2016-03-17 
08:52:41 ==
  An error occurs when running zEDC compression/decompression and hotplugging 
PCI devices.
  There was 1G of memory, 2 pci functions and 50 threads of gunzipping enabled.

  Mar 14 23:59:01 s8330018 kernel: [ 4972.486883] BUG: Bad page state in 
process genwqe_gunzip  pfn:3c275
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486888] page:000003d100f09d40 
count:-1 mapcount:0 mapping:          (null) index:0x0
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486891] flags: 0x0()
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486895] page dumped because: nonzero 
_count
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486897] Modules linked in: 
xt_CHECKSUM(E) iptable_mangle(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) 
iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) 
nf_conntrack(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) iptable_filter(E) 
ip_tables(E) x_tables(E) genwqe_card(E) crc_itu_t(E) qeth_l2(E) qeth(E) vmur(E) 
ccwgroup(E) dm_multipath(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_sa(E) 
ib_mad(E) ib_core(E) ib_addr(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) 
scsi_transport_iscsi(E) btrfs(E) zlib_deflate(E) raid10(E) raid456(E) 
async_memcpy(E) async_raid6_recov(E) async_pq(E) async_xor(E) async_tx(E) 
xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) linear(E) ghash_s390(E) 
prng(E) aes_s390(E) des_s390(E) des_generic(E) sha512_s390(E) sha256_s390(E) 
sha1_s390(E) sha_common(E) zfcp(E) qdio(E) scsi_transport_fc(E) 
dasd_eckd_mod(E) dasd_mod(E)
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] CPU: 0 PID: 37867 Comm: 
genwqe_gunzip Tainted: G        W   E   4.4.0-8-generic #23-Ubuntu
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486916]        00000000209176f8 
0000000020917788 0000000000000002 0000000000000000
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486916]        0000000020917828 
00000000209177a0 00000000209177a0 0000000000114182
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486916]        0000000000000011 
000000000092345a 000003d10000000a 000000000000000a
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486916]        00000000209177e8 
0000000020917788 0000000000000000 0000000020914000
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486916]        0000000000000000 
0000000000114182 0000000020917788 00000000209177e8
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486922] Call Trace:
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486927] ([<000000000011406e>] 
show_trace+0xf6/0x148)
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486929]  [<0000000000114136>] 
show_stack+0x76/0xe8
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486934]  [<0000000000518c26>] 
dump_stack+0x6e/0x90
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486937]  [<000000000027c376>] 
bad_page+0xe6/0x148
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486938]  [<0000000000280516>] 
get_page_from_freelist+0x49e/0xba8
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486940]  [<0000000000280ede>] 
__alloc_pages_nodemask+0x166/0xb00
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486941]  [<000000000015635a>] 
s390_dma_alloc+0x82/0x1a0
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486944]  [<000003ff805ea142>] 
__genwqe_alloc_consistent+0x7a/0x90 [genwqe_card]
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486947]  [<000003ff805ea344>] 
genwqe_alloc_sync_sgl+0x17c/0x2e0 [genwqe_card]
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486950]  [<000003ff805e52da>] 
do_execute_ddcb+0x1da/0x348 [genwqe_card]
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486952]  [<000003ff805e5964>] 
genwqe_ioctl+0x51c/0xc20 [genwqe_card]
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486953]  [<00000000003145ee>] 
do_vfs_ioctl+0x3b6/0x518
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486955]  [<00000000003147f4>] 
SyS_ioctl+0xa4/0xb8
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486956]  [<00000000007ad1be>] 
system_call+0xd6/0x264
  Mar 14 23:59:01 s8330018 kernel: [ 4972.486957]  [<000003ffa9df2492>] 
0x3ffa9df2492

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-release-notes/+bug/1559194/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to