I'm happy to create a new bug report for this, however before I do I wanted to follow up here first. I've been working on a bionic VM template this week and the issue has resurfaced. Client (18.04) reboots daily at 3:00 a.m., and somewhere between 30 minutes and 2 hours later, the CIFS mount point stops responding. Meanwhile other clients (16.04, and Windows) continue chugging along merrily. A reboot sometimes fixes the problem, and sometimes the problem has fixed itself by 8am when I arrive. Here's some syslog debug output after the machine finishes booting.
Yesterday it cleared up on its own. Today the server is still down 10 hours later. I would blame Java, except that the whole mount point becomes non- responsive when this happens, not just for that one process. Jun 6 03:00:37 localhost systemd[1]: Reached target Multi-User System. Jun 6 03:00:37 localhost systemd[1]: Starting Execute cloud user/final scripts... Jun 6 03:00:37 localhost systemd[1]: Reached target Graphical Interface. Jun 6 03:00:37 localhost systemd[1]: Starting Update UTMP about System Runlevel Changes... Jun 6 03:00:38 localhost systemd[1]: Started Update UTMP about System Runlevel Changes. Jun 6 03:00:38 localhost cloud-init[1531]: Cloud-init v. 18.2 running 'modules:final' at Wed, 06 Jun 2018 03:00:38 +0000. Up 23.72 seconds. Jun 6 03:00:38 localhost cloud-init[1531]: Cloud-init v. 18.2 finished at Wed, 06 Jun 2018 03:00:38 +0000. Datasource DataSourceNoCloud [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net]. Up 23.84 seconds Jun 6 03:00:38 localhost systemd[1]: Started Execute cloud user/final scripts. Jun 6 03:00:38 localhost systemd[1]: Reached target Cloud-init target. Jun 6 03:00:38 localhost systemd[1]: Startup finished in 2.806s (kernel) + 21.078s (userspace) = 23.885s. Jun 6 03:00:39 localhost kernel: [ 24.927412] TCP: ens160: Driver has suspect GRO implementation, TCP performance may be compromised. Jun 6 03:00:51 localhost systemd-timesyncd[574]: Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com). Jun 6 03:14:28 localhost systemd[1]: Starting Message of the Day... Jun 6 03:14:30 localhost 50-motd-news[1699]: * Meltdown, Spectre and Ubuntu: What are the attack vectors, Jun 6 03:14:30 localhost 50-motd-news[1699]: how the fixes work, and everything else you need to know Jun 6 03:14:30 localhost 50-motd-news[1699]: - https://ubu.one/u2Know Jun 6 03:14:30 localhost systemd[1]: Started Message of the Day. Jun 6 03:15:48 localhost systemd[1]: Starting Cleanup of Temporary Directories... Jun 6 03:15:48 localhost systemd[1]: Started Cleanup of Temporary Directories. Jun 6 03:17:01 localhost CRON[1770]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 6 04:00:01 localhost CRON[1878]: (root) CMD (/mnt/www/config/backup_config.sh) Jun 6 04:17:01 localhost CRON[1916]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 6 04:17:33 localhost nslcd[1438]: [b141f2] <group/member="root"> failed to bind to LDAP server ldap://dc01.example.com: Can't contact LDAP server Jun 6 04:17:33 localhost nslcd[1438]: [b141f2] <group/member="root"> connected to LDAP server ldap://dc02.example.com Jun 6 04:30:01 localhost CRON[1938]: (root) CMD (/usr/sbin/ntpdate time-a.nist.gov time-b.nist.gov 0.pool.ntp.org 1.pool.ntp.org) Jun 6 04:30:01 localhost CRON[1937]: (CRON) info (No MTA installed, discarding output) Jun 6 05:12:02 localhost kernel: [ 7906.798052] CIFS VFS: Server cifshost.example.com has not responded in 120 seconds. Reconnecting... Jun 6 05:12:02 localhost kernel: [ 7906.800726] CIFS VFS: Free previous auth_key.response = 000000003e802799 Jun 6 05:13:10 localhost kernel: [ 7975.657719] INFO: task java:1672 blocked for more than 120 seconds. Jun 6 05:13:10 localhost kernel: [ 7975.657741] Not tainted 4.15.0-22-generic #24-Ubuntu Jun 6 05:13:10 localhost kernel: [ 7975.657757] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 6 05:13:10 localhost kernel: [ 7975.657779] java D 0 1672 1 0x80000000 Jun 6 05:13:10 localhost kernel: [ 7975.657781] Call Trace: Jun 6 05:13:10 localhost kernel: [ 7975.657788] __schedule+0x297/0x8b0 Jun 6 05:13:10 localhost kernel: [ 7975.657791] ? __wake_up+0x13/0x20 Jun 6 05:13:10 localhost kernel: [ 7975.657792] schedule+0x2c/0x80 Jun 6 05:13:10 localhost kernel: [ 7975.657795] io_schedule+0x16/0x40 Jun 6 05:13:10 localhost kernel: [ 7975.657798] wait_on_page_bit_common+0xd8/0x160 Jun 6 05:13:10 localhost kernel: [ 7975.657800] ? page_cache_tree_insert+0xe0/0xe0 Jun 6 05:13:10 localhost kernel: [ 7975.657801] __filemap_fdatawait_range+0xfa/0x160 Jun 6 05:13:10 localhost kernel: [ 7975.657803] filemap_write_and_wait+0x4d/0x90 Jun 6 05:13:10 localhost kernel: [ 7975.657826] cifs_flush+0x43/0x90 [cifs] Jun 6 05:13:10 localhost kernel: [ 7975.657830] filp_close+0x2f/0x80 Jun 6 05:13:10 localhost kernel: [ 7975.657832] __close_fd+0x85/0xa0 Jun 6 05:13:10 localhost kernel: [ 7975.657834] SyS_close+0x23/0x50 Jun 6 05:13:10 localhost kernel: [ 7975.657836] do_syscall_64+0x73/0x130 Jun 6 05:13:10 localhost kernel: [ 7975.657838] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Jun 6 05:13:10 localhost kernel: [ 7975.657839] RIP: 0033:0x7fd159dbd447 Jun 6 05:13:10 localhost kernel: [ 7975.657840] RSP: 002b:00007fd12880c440 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 Jun 6 05:13:10 localhost kernel: [ 7975.657841] RAX: ffffffffffffffda RBX: 0000000000000175 RCX: 00007fd159dbd447 Jun 6 05:13:10 localhost kernel: [ 7975.657842] RDX: 0000000000000000 RSI: 00000007c0023318 RDI: 0000000000000175 Jun 6 05:13:10 localhost kernel: [ 7975.657842] RBP: 00007fd12880c490 R08: 0000000000000000 R09: 0000000781601e70 Jun 6 05:13:10 localhost kernel: [ 7975.657843] R10: 0000000000002288 R11: 0000000000000293 R12: 00007fd158ccab40 Jun 6 05:13:10 localhost kernel: [ 7975.657844] R13: 00007fd0f000f1e8 R14: 0000000000000042 R15: 00007fd12880c4a0 Jun 6 05:14:03 localhost kernel: [ 8028.649934] CIFS VFS: Server cifshost.example.com has not responded in 120 seconds. Reconnecting... Jun 6 05:14:03 localhost kernel: [ 8028.652603] CIFS VFS: Free previous auth_key.response = 000000003e802799 Jun 6 05:15:11 localhost kernel: [ 8096.485698] INFO: task java:1672 blocked for more than 120 seconds. Jun 6 05:15:11 localhost kernel: [ 8096.485721] Not tainted 4.15.0-22-generic #24-Ubuntu Jun 6 05:15:11 localhost kernel: [ 8096.485737] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 6 05:15:11 localhost kernel: [ 8096.485758] java D 0 1672 1 0x80000000 Jun 6 05:15:11 localhost kernel: [ 8096.485760] Call Trace: Jun 6 05:15:11 localhost kernel: [ 8096.485769] __schedule+0x297/0x8b0 Jun 6 05:15:11 localhost kernel: [ 8096.485772] ? __wake_up+0x13/0x20 Jun 6 05:15:11 localhost kernel: [ 8096.485774] schedule+0x2c/0x80 Jun 6 05:15:11 localhost kernel: [ 8096.485776] io_schedule+0x16/0x40 Jun 6 05:15:11 localhost kernel: [ 8096.485779] wait_on_page_bit_common+0xd8/0x160 Jun 6 05:15:11 localhost kernel: [ 8096.485781] ? page_cache_tree_insert+0xe0/0xe0 Jun 6 05:15:11 localhost kernel: [ 8096.485782] __filemap_fdatawait_range+0xfa/0x160 Jun 6 05:15:11 localhost kernel: [ 8096.485785] filemap_write_and_wait+0x4d/0x90 Jun 6 05:15:11 localhost kernel: [ 8096.485818] cifs_flush+0x43/0x90 [cifs] Jun 6 05:15:11 localhost kernel: [ 8096.485821] filp_close+0x2f/0x80 Jun 6 05:15:11 localhost kernel: [ 8096.485823] __close_fd+0x85/0xa0 Jun 6 05:15:11 localhost kernel: [ 8096.485824] SyS_close+0x23/0x50 Jun 6 05:15:11 localhost kernel: [ 8096.485827] do_syscall_64+0x73/0x130 Jun 6 05:15:11 localhost kernel: [ 8096.485829] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Jun 6 05:15:11 localhost kernel: [ 8096.485830] RIP: 0033:0x7fd159dbd447 Jun 6 05:15:11 localhost kernel: [ 8096.485831] RSP: 002b:00007fd12880c440 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 Jun 6 05:15:11 localhost kernel: [ 8096.485832] RAX: ffffffffffffffda RBX: 0000000000000175 RCX: 00007fd159dbd447 Jun 6 05:15:11 localhost kernel: [ 8096.485833] RDX: 0000000000000000 RSI: 00000007c0023318 RDI: 0000000000000175 Jun 6 05:15:11 localhost kernel: [ 8096.485834] RBP: 00007fd12880c490 R08: 0000000000000000 R09: 0000000781601e70 Jun 6 05:15:11 localhost kernel: [ 8096.485834] R10: 0000000000002288 R11: 0000000000000293 R12: 00007fd158ccab40 Jun 6 05:15:11 localhost kernel: [ 8096.485835] R13: 00007fd0f000f1e8 R14: 0000000000000042 R15: 00007fd12880c4a0 Jun 6 05:16:05 localhost kernel: [ 8150.501876] CIFS VFS: Server cifshost.example.com has not responded in 120 seconds. Reconnecting... Jun 6 05:16:05 localhost kernel: [ 8150.504330] CIFS VFS: Free previous auth_key.response = 000000000551ece7 Jun 6 05:17:01 localhost CRON[2077]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 6 05:17:12 localhost kernel: [ 8217.313824] INFO: task java:1672 blocked for more than 120 seconds. Jun 6 05:17:12 localhost kernel: [ 8217.313849] Not tainted 4.15.0-22-generic #24-Ubuntu Jun 6 05:17:12 localhost kernel: [ 8217.313864] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 6 05:17:12 localhost kernel: [ 8217.313884] java D 0 1672 1 0x80000000 Jun 6 05:17:12 localhost kernel: [ 8217.313887] Call Trace: Jun 6 05:17:12 localhost kernel: [ 8217.313894] __schedule+0x297/0x8b0 Jun 6 05:17:12 localhost kernel: [ 8217.313897] ? __wake_up+0x13/0x20 Jun 6 05:17:12 localhost kernel: [ 8217.313899] schedule+0x2c/0x80 Jun 6 05:17:12 localhost kernel: [ 8217.313901] io_schedule+0x16/0x40 Jun 6 05:17:12 localhost kernel: [ 8217.313904] wait_on_page_bit_common+0xd8/0x160 Jun 6 05:17:12 localhost kernel: [ 8217.313906] ? page_cache_tree_insert+0xe0/0xe0 Jun 6 05:17:12 localhost kernel: [ 8217.313908] __filemap_fdatawait_range+0xfa/0x160 Jun 6 05:17:12 localhost kernel: [ 8217.313910] filemap_write_and_wait+0x4d/0x90 Jun 6 05:17:12 localhost kernel: [ 8217.313931] cifs_flush+0x43/0x90 [cifs] Jun 6 05:17:12 localhost kernel: [ 8217.313934] filp_close+0x2f/0x80 Jun 6 05:17:12 localhost kernel: [ 8217.313936] __close_fd+0x85/0xa0 Jun 6 05:17:12 localhost kernel: [ 8217.313938] SyS_close+0x23/0x50 Jun 6 05:17:12 localhost kernel: [ 8217.313940] do_syscall_64+0x73/0x130 Jun 6 05:17:12 localhost kernel: [ 8217.313941] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Jun 6 05:17:12 localhost kernel: [ 8217.313943] RIP: 0033:0x7fd159dbd447 Jun 6 05:17:12 localhost kernel: [ 8217.313944] RSP: 002b:00007fd12880c440 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 Jun 6 05:17:12 localhost kernel: [ 8217.313945] RAX: ffffffffffffffda RBX: 0000000000000175 RCX: 00007fd159dbd447 Jun 6 05:17:12 localhost kernel: [ 8217.313946] RDX: 0000000000000000 RSI: 00000007c0023318 RDI: 0000000000000175 Jun 6 05:17:12 localhost kernel: [ 8217.313946] RBP: 00007fd12880c490 R08: 0000000000000000 R09: 0000000781601e70 Jun 6 05:17:12 localhost kernel: [ 8217.313947] R10: 0000000000002288 R11: 0000000000000293 R12: 00007fd158ccab40 Jun 6 05:17:12 localhost kernel: [ 8217.313948] R13: 00007fd0f000f1e8 R14: 0000000000000042 R15: 00007fd12880c4a0 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1729337 Title: CIFS errors on 4.4.0-98, but not on 4.4.0-97 with same config Status in linux package in Ubuntu: In Progress Status in linux source package in Trusty: In Progress Status in linux source package in Xenial: Fix Released Status in linux source package in Zesty: In Progress Status in linux source package in Artful: Fix Committed Bug description: == SRU Justification == The bug reporter stated they have a cluster of servers that applied Xenial updates and then were unable to mount CIFS shares after upgrading to 4.4.0-98. The same machines on 4.4.0-97 do not hit the regression. It was found that the regression is fixed by mainline commit: 4587eee04e2a ("SMB3: Validate negotiate request must always be signed"). This fix is required in all Ubuntu supported releases. Commit 4587eee04e2a landed in mailine as of 4.14-rc7. It was also cc'd to upstream stable, but it has not landed in any stable releases yet, which is the reason for this SRU. == Fix == commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd Author: Steve French <smfre...@gmail.com> Date: Wed Oct 25 15:58:31 2017 -0500 SMB3: Validate negotiate request must always be signed == Regression Potential == This patch is to fix a regression. It was also cc'd to upstream stable, so it received addition review upstream. We have a cluster of servers that applied a security update overnight and were unable to mount CIFS shares after upgrading to 4.4.0-98. The same machines on 4.4.0-97 were fine the night before, and are fine after downgrading. The only error message CIFS would report, even on verbose, was: [ 257.089876] CIFS VFS: validate protocol negotiate failed: -11 [ 257.089964] CIFS VFS: cifs_mount failed w/return code = -5 Rebooting did not help. Nor did attempting to mount the share manually using mount -t cifs. Here's the offending line from our /etc/fstab (with hostnames sanitized): //server/share /mnt/share cifs rw,user,credentials=/etc/samba/credentials.share,uid=33,gid=33,file_mode=0770,dir_mode=0770 ,exec,soft,noserverino,vers=3.0 0 0 ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: linux-image-4.4.0-98-generic 4.4.0-98.121 ProcVersionSignature: Ubuntu 4.4.0-98.121-generic 4.4.90 Uname: Linux 4.4.0-98-generic x86_64 AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Nov 1 07:56 seq crw-rw---- 1 root audio 116, 33 Nov 1 07:56 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.1-0ubuntu2.10 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Wed Nov 1 08:49:47 2017 HibernationDevice: RESUME=/dev/mapper/ubuntu--template--vg-swap_1 InstallationDate: Installed on 2016-12-16 (319 days ago) InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: MachineType: VMware, Inc. VMware Virtual Platform PciMultimedia: ProcFB: 0 svgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-98-generic root=/dev/mapper/ubuntu--template--vg-root ro RelatedPackageVersions: linux-restricted-modules-4.4.0-98-generic N/A linux-backports-modules-4.4.0-98-generic N/A linux-firmware 1.157.13 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 09/21/2015 dmi.bios.vendor: Phoenix Technologies LTD dmi.bios.version: 6.00 dmi.board.name: 440BX Desktop Reference Platform dmi.board.vendor: Intel Corporation dmi.board.version: None dmi.chassis.asset.tag: No Asset Tag dmi.chassis.type: 1 dmi.chassis.vendor: No Enclosure dmi.chassis.version: N/A dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd09/21/2015:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A: dmi.product.name: VMware Virtual Platform dmi.product.version: None dmi.sys.vendor: VMware, Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729337/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp