On Friday, October 09, 2015 at 06:59, Oskar Liljeblad wrote: > > > To see if it is the cause of this issue, I built a test kernel with a > > > revert of commit 97b2591. The test kernel can be downloaded from: > > > > > > http://kernel.ubuntu.com/~jsalisbury/lp1499203/ [..] > The 3.13.0-66.107~lp1445195Commit97b2591Reverted kernel seem to work just > fine. No memory leaks as far as I can see.
By the way, I had to downgrade the kernel above to 3.13.0-65.106 on one server because of some strange IO lockup issues. I'm afraid this won't be of much help, but I'm writing it anyway. It started 1 minute after boot with the new kernel: Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106544] BUG: unable to handle kernel NULL pointer dereference at (null) Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106592] IP: [<ffffffff81206c5b>] eventpoll_release_file+0x2b/0xa0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106624] PGD 1f72db067 PUD 1fa753067 PMD 0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106659] Oops: 0000 [#1] SMP Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106684] Modules linked in: joydev hid_generic mac_hid serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd nls_iso8859_1 hid_hyperv hyperv_fb hid hyperv_keyboard lp parport hv_netvsc hv_utils hv_storvsc hv_vmbus Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106848] CPU: 1 PID: 1286 Comm: mongod Not tainted 3.13.0-66-generic #107~lp1445195Commit97b2591Reverted Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106884] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106923] task: ffff8801f722c800 ti: ffff8801f72ce000 task.ti: ffff8801f72ce000 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106950] RIP: 0010:[<ffffffff81206c5b>] [<ffffffff81206c5b>] eventpoll_release_file+0x2b/0xa0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106986] RSP: 0018:ffff8801f72cfe78 EFLAGS: 00010246 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107006] RAX: 0000000000000000 RBX: ffff8801f775e300 RCX: 0000000040000010 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107032] RDX: 0000000001000000 RSI: 0000000000000000 RDI: ffffffff81c72e80 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107058] RBP: ffff8801f72cfea0 R08: 0000000000000000 R09: 0000000000000001 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107084] R10: ffff8801f775ece1 R11: 0000000000000293 R12: 0000000000000010 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107110] R13: ffff8801f775ece1 R14: ffff8801f775ee40 R15: ffff8801f775e3b0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107137] FS: 00007f23b299f700(0000) GS:ffff8801fee20000(0000) knlGS:0000000000000000 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107166] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107190] CR2: 0000000000000000 CR3: 00000001f7a94000 CR4: 00000000001406e0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107224] Stack: Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107235] ffff8801f775e300 0000000000000010 ffff8801f775ece1 ffff8801f775ee40 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107270] ffff880036927a40 ffff8801f72cfee8 ffffffff811bfb7a ffffffff8133ed81 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107302] ffff8801fa8bbe30 0000000000000000 ffffffff81ebb680 ffff8801f722ce20 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107336] Call Trace: Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107353] [<ffffffff811bfb7a>] __fput+0x24a/0x260 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107375] [<ffffffff8133ed81>] ? blkdev_issue_flush+0x71/0x90 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107400] [<ffffffff811bfbde>] ____fput+0xe/0x10 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107421] [<ffffffff81088377>] task_work_run+0xa7/0xe0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107444] [<ffffffff81013e57>] do_notify_resume+0x97/0xb0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107468] [<ffffffff8173431a>] int_signal+0x12/0x17 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107491] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 ff 48 c7 c7 80 2e c7 81 49 81 c7 b0 00 00 00 41 56 41 55 41 54 53 e8 b8 30 52 00 49 8b 07 <48> 8b 08 49 39 c7 4c 8d 60 a8 48 8d 59 a8 75 0b eb 3e 0f 1f 00 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107648] RIP [<ffffffff81206c5b>] eventpoll_release_file+0x2b/0xa0 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107675] RSP <ffff8801f72cfe78> Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107689] CR2: 0000000000000000 Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107717] ---[ end trace 87deccc21e1958fa ]--- Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210565] ------------[ cut here ]------------ Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210612] kernel BUG at /home/jsalisbury/bugs/lp1499203/ubuntu-trusty/mm/rmap.c:1035! Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210642] invalid opcode: 0000 [#2] SMP Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210663] Modules linked in: joydev hid_generic mac_hid serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd nls_iso8859_1 hid_hyperv hyperv_fb hid hyperv_keyboard lp parport hv_netvsc hv_utils hv_storvsc hv_vmbus Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210796] CPU: 1 PID: 1771 Comm: mongod Tainted: G D 3.13.0-66-generic #107~lp1445195Commit97b2591Reverted Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210834] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210873] task: ffff8801f7713000 ti: ffff8801fafa4000 task.ti: ffff8801fafa4000 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210900] RIP: 0010:[<ffffffff8171ee8a>] [<ffffffff8171ee8a>] __page_set_anon_rmap.part.22+0x9/0xb Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210939] RSP: 0018:ffff8801fafa59e8 EFLAGS: 00010246 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210960] RAX: 0000000000000000 RBX: ffffea00079a2340 RCX: ffffffffffffffe8 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.210986] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880207ff4f00 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.211021] RBP: ffff8801fafa59e8 R08: 00000000fffffff9 R09: 0000000000000000 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.212294] R10: 000000000000000c R11: 00000000003e9480 R12: 00007f084a5619e0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214126] R13: 0000000000000000 R14: ffff8801f775e300 R15: 0000000000000000 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] FS: 00007f084a561700(0000) GS:ffff8801fee20000(0000) knlGS:0000000000000000 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] CR2: 00007f084a5619e0 CR3: 00000001f7a94000 CR4: 00000000001406e0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] Stack: Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] ffff8801fafa5a18 ffffffff8118464a 00007f084a5619e0 ffff8800f78ea290 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] ffff8801f775e300 ffff8801fa652300 ffff8801fafa5ab0 ffffffff8117a708 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] ffff880035aab300 0000000035aab300 0000000000000000 0000000000001f4a Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] Call Trace: Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8118464a>] do_page_add_anon_rmap+0x10a/0x120 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8117a708>] handle_mm_fault+0xcf8/0xf00 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8172f624>] __do_page_fault+0x184/0x560 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff810a3281>] ? update_cfs_shares+0xb1/0x100 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8109ee48>] ? __enqueue_entity+0x78/0x80 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff810a51dd>] ? enqueue_entity+0x2ad/0xbb0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8101bb33>] ? native_sched_clock+0x13/0x80 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff810a5f02>] ? enqueue_task_fair+0x422/0x6d0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8172fa1a>] do_page_fault+0x1a/0x70 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8172bd68>] page_fault+0x28/0x30 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8137184f>] ? __get_user_8+0x1f/0x29 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff810db202>] ? exit_robust_list+0x32/0x130 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff81064a53>] mm_release+0x123/0x140 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff81069b43>] do_exit+0x153/0xa40 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8106a4af>] do_group_exit+0x3f/0xa0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8107a190>] get_signal_to_deliver+0x1d0/0x6d0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff810133f8>] do_signal+0x48/0xa10 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff81179e92>] ? handle_mm_fault+0x482/0xf00 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff81013e29>] do_notify_resume+0x69/0xb0 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] [<ffffffff8172bb62>] retint_signal+0x48/0x86 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] Code: c4 40 74 03 8b 4f 68 bf 00 10 00 00 48 d3 e7 e8 2d 58 a7 ff 5d c3 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 0f 1f 44 00 00 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 89 f2 be 00 80 00 00 Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] RIP [<ffffffff8171ee8a>] __page_set_anon_rmap.part.22+0x9/0xb Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.214539] RSP <ffff8801fafa59e8> Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.249727] ---[ end trace 87deccc21e1958fb ]--- Oct 13 00:08:51 af-mdbdrs2 kernel: [ 221.251013] Fixing recursive fault but reboot is needed! After that all IO on that device stuck. I rebooted the server and the issue occurred again, basically the same messages logged. Regards, Oskar Liljeblad -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1499203 Title: memory leak in hv_storvsc (3.13.0-63-generic) Status in linux package in Ubuntu: Confirmed Status in linux source package in Trusty: Confirmed Bug description: Slab and SUnreclaim values in /proc/meminfo keep increasing. On one servers it reached 85% of physical memory after 14 days - but on most other servers it increases more slowly. I checked /proc/slabinfo and almost all allocations were in kmalloc-512. So I enabled "slub_debug=U,kmalloc-512" on one server, and after only 24h of uptime 11% of the memory was used by kmalloc-512 and unreclaimable. With debugging enabled I could see the following in /sys/kernel/slab/kmalloc-512/alloc_calls: 521294 storvsc_queuecommand+0x359/0x790 [hv_storvsc] age=161922/955116/20882927 pid=1-41545 All other counters were below 2000. In /sys/kernel/slab/kmalloc-512/free_calls I see the following: 516823 <not-available> age=4315783846 pid=0 The hv_storvsc module is for Hyper-V. We are (unfortunately) running Hyper-V 6.3.9600.16384 with Microsoft System Center 2012 R2 Update rollup 3 for all the servers with this issue. Kernels are stock linux-image-3.13.0-63-generic, 3.13.0-63.103, x86_64, from Ubuntu 14.04 LTS . /proc/version_signature contains: Ubuntu 3.13.0-63.103-generic 3.13.11-ckt25 No output from lspci -vnvn. The problem described above happens on both single and multicore virtual machines. CPU in hypervisors are E5-2630 v2 @ 2.60GHz. Let me know if you need more info or if I can do more debugging. Regards, Oskar Liljeblad --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Sep 24 00:31 seq crw-rw---- 1 root audio 116, 33 Sep 24 00:31 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.14.1-0ubuntu3.13 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: Error: [Errno 2] No such file or directory CurrentDmesg: [59081.977909] systemd-udevd[26480]: starting version 204 [59124.051974] init: systemd-logind main process (756) killed by TERM signal DistroRelease: Ubuntu 14.04 InstallationDate: Installed on 2014-09-09 (380 days ago) InstallationMedia: Ubuntu-Server 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.3) IwConfig: eth0 no wireless extensions. eth1 no wireless extensions. lo no wireless extensions. Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99 MachineType: Microsoft Corporation Virtual Machine Package: linux (not installed) PciMultimedia: ProcFB: 0 hyperv_fb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-63-generic.efi.signed root=UUID=f4d228d6-2eee-40fc-bf3f-633e46fa8301 ro slub_debug=U,kmalloc-512 ProcVersionSignature: Ubuntu 3.13.0-63.103-generic 3.13.11-ckt25 RelatedPackageVersions: linux-restricted-modules-3.13.0-63-generic N/A linux-backports-modules-3.13.0-63-generic N/A linux-firmware 1.127.15 RfKill: Error: [Errno 2] No such file or directory Tags: trusty Uname: Linux 3.13.0-63-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: WifiSyslog: Sep 24 02:06:19 adm-backup1 dhclient: message repeated 1447 times: [ DHCPREQUEST of 10.40.128.9 on eth0 to 192.0.2.253 port 67 (xid=0x429dad4)] Sep 24 02:06:37 adm-backup1 dhclient: DHCPREQUEST of 10.40.128.9 on eth0 to 255.255.255.255 port 67 (xid=0x429dad4) Sep 24 02:06:37 adm-backup1 dhclient: DHCPACK of 10.40.128.9 from 192.0.2.253 Sep 24 02:06:37 adm-backup1 dhclient: bound to 10.40.128.9 -- renewal in 44877 seconds. _MarkForUpload: True dmi.bios.date: 11/26/2012 dmi.bios.vendor: Microsoft Corporation dmi.bios.version: Hyper-V UEFI Release v1.0 dmi.board.asset.tag: None dmi.board.name: Virtual Machine dmi.board.vendor: Microsoft Corporation dmi.board.version: Hyper-V UEFI Release v1.0 dmi.chassis.asset.tag: 6126-4244-1659-0314-3158-3955-44 dmi.chassis.type: 3 dmi.chassis.vendor: Microsoft Corporation dmi.chassis.version: Hyper-V UEFI Release v1.0 dmi.modalias: dmi:bvnMicrosoftCorporation:bvrHyper-VUEFIReleasev1.0:bd11/26/2012:svnMicrosoftCorporation:pnVirtualMachine:pvrHyper-VUEFIReleasev1.0:rvnMicrosoftCorporation:rnVirtualMachine:rvrHyper-VUEFIReleasev1.0:cvnMicrosoftCorporation:ct3:cvrHyper-VUEFIReleasev1.0: dmi.product.name: Virtual Machine dmi.product.version: Hyper-V UEFI Release v1.0 dmi.sys.vendor: Microsoft Corporation To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1499203/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp