I'm happy to create a new bug report for this, however before I do I
wanted to follow up here first.  I've been working on a bionic VM
template this week and the issue has resurfaced.  Client (18.04) reboots
daily at 3:00 a.m., and somewhere between 30 minutes and 2 hours later,
the CIFS mount point stops responding.  Meanwhile other clients (16.04,
and Windows) continue chugging along merrily.  A reboot sometimes fixes
the problem, and sometimes the problem has fixed itself by 8am when I
arrive.  Here's some syslog debug output after the machine finishes
booting.

Yesterday it cleared up on its own.  Today the server is still down 10
hours later.

I would blame Java, except that the whole mount point becomes non-
responsive when this happens, not just for that one process.

Jun  6 03:00:37 localhost systemd[1]: Reached target Multi-User System.
Jun  6 03:00:37 localhost systemd[1]: Starting Execute cloud user/final 
scripts...
Jun  6 03:00:37 localhost systemd[1]: Reached target Graphical Interface.
Jun  6 03:00:37 localhost systemd[1]: Starting Update UTMP about System 
Runlevel Changes...
Jun  6 03:00:38 localhost systemd[1]: Started Update UTMP about System Runlevel 
Changes.
Jun  6 03:00:38 localhost cloud-init[1531]: Cloud-init v. 18.2 running 
'modules:final' at Wed, 06 Jun 2018 03:00:38 +0000. Up 23.72 seconds.
Jun  6 03:00:38 localhost cloud-init[1531]: Cloud-init v. 18.2 finished at Wed, 
06 Jun 2018 03:00:38 +0000. Datasource DataSourceNoCloud 
[seed=/var/lib/cloud/seed/nocloud-net][dsmode=net].  Up 23.84 seconds
Jun  6 03:00:38 localhost systemd[1]: Started Execute cloud user/final scripts.
Jun  6 03:00:38 localhost systemd[1]: Reached target Cloud-init target.
Jun  6 03:00:38 localhost systemd[1]: Startup finished in 2.806s (kernel) + 
21.078s (userspace) = 23.885s.
Jun  6 03:00:39 localhost kernel: [   24.927412] TCP: ens160: Driver has 
suspect GRO implementation, TCP performance may be compromised.
Jun  6 03:00:51 localhost systemd-timesyncd[574]: Synchronized to time server 
91.189.91.157:123 (ntp.ubuntu.com).
Jun  6 03:14:28 localhost systemd[1]: Starting Message of the Day...
Jun  6 03:14:30 localhost 50-motd-news[1699]:  * Meltdown, Spectre and Ubuntu: 
What are the attack vectors,
Jun  6 03:14:30 localhost 50-motd-news[1699]:    how the fixes work, and 
everything else you need to know
Jun  6 03:14:30 localhost 50-motd-news[1699]:    - https://ubu.one/u2Know
Jun  6 03:14:30 localhost systemd[1]: Started Message of the Day.
Jun  6 03:15:48 localhost systemd[1]: Starting Cleanup of Temporary 
Directories...
Jun  6 03:15:48 localhost systemd[1]: Started Cleanup of Temporary Directories.
Jun  6 03:17:01 localhost CRON[1770]: (root) CMD (   cd / && run-parts --report 
/etc/cron.hourly)
Jun  6 04:00:01 localhost CRON[1878]: (root) CMD 
(/mnt/www/config/backup_config.sh)
Jun  6 04:17:01 localhost CRON[1916]: (root) CMD (   cd / && run-parts --report 
/etc/cron.hourly)
Jun  6 04:17:33 localhost nslcd[1438]: [b141f2] <group/member="root"> failed to 
bind to LDAP server ldap://dc01.example.com: Can't contact LDAP server
Jun  6 04:17:33 localhost nslcd[1438]: [b141f2] <group/member="root"> connected 
to LDAP server ldap://dc02.example.com
Jun  6 04:30:01 localhost CRON[1938]: (root) CMD (/usr/sbin/ntpdate 
time-a.nist.gov time-b.nist.gov 0.pool.ntp.org 1.pool.ntp.org)
Jun  6 04:30:01 localhost CRON[1937]: (CRON) info (No MTA installed, discarding 
output)
Jun  6 05:12:02 localhost kernel: [ 7906.798052] CIFS VFS: Server 
cifshost.example.com has not responded in 120 seconds. Reconnecting...
Jun  6 05:12:02 localhost kernel: [ 7906.800726] CIFS VFS: Free previous 
auth_key.response = 000000003e802799
Jun  6 05:13:10 localhost kernel: [ 7975.657719] INFO: task java:1672 blocked 
for more than 120 seconds.
Jun  6 05:13:10 localhost kernel: [ 7975.657741]       Not tainted 
4.15.0-22-generic #24-Ubuntu
Jun  6 05:13:10 localhost kernel: [ 7975.657757] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  6 05:13:10 localhost kernel: [ 7975.657779] java            D    0  1672   
   1 0x80000000
Jun  6 05:13:10 localhost kernel: [ 7975.657781] Call Trace:
Jun  6 05:13:10 localhost kernel: [ 7975.657788]  __schedule+0x297/0x8b0
Jun  6 05:13:10 localhost kernel: [ 7975.657791]  ? __wake_up+0x13/0x20
Jun  6 05:13:10 localhost kernel: [ 7975.657792]  schedule+0x2c/0x80
Jun  6 05:13:10 localhost kernel: [ 7975.657795]  io_schedule+0x16/0x40
Jun  6 05:13:10 localhost kernel: [ 7975.657798]  
wait_on_page_bit_common+0xd8/0x160
Jun  6 05:13:10 localhost kernel: [ 7975.657800]  ? 
page_cache_tree_insert+0xe0/0xe0
Jun  6 05:13:10 localhost kernel: [ 7975.657801]  
__filemap_fdatawait_range+0xfa/0x160
Jun  6 05:13:10 localhost kernel: [ 7975.657803]  
filemap_write_and_wait+0x4d/0x90
Jun  6 05:13:10 localhost kernel: [ 7975.657826]  cifs_flush+0x43/0x90 [cifs]
Jun  6 05:13:10 localhost kernel: [ 7975.657830]  filp_close+0x2f/0x80
Jun  6 05:13:10 localhost kernel: [ 7975.657832]  __close_fd+0x85/0xa0
Jun  6 05:13:10 localhost kernel: [ 7975.657834]  SyS_close+0x23/0x50
Jun  6 05:13:10 localhost kernel: [ 7975.657836]  do_syscall_64+0x73/0x130
Jun  6 05:13:10 localhost kernel: [ 7975.657838]  
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jun  6 05:13:10 localhost kernel: [ 7975.657839] RIP: 0033:0x7fd159dbd447
Jun  6 05:13:10 localhost kernel: [ 7975.657840] RSP: 002b:00007fd12880c440 
EFLAGS: 00000293 ORIG_RAX: 0000000000000003
Jun  6 05:13:10 localhost kernel: [ 7975.657841] RAX: ffffffffffffffda RBX: 
0000000000000175 RCX: 00007fd159dbd447
Jun  6 05:13:10 localhost kernel: [ 7975.657842] RDX: 0000000000000000 RSI: 
00000007c0023318 RDI: 0000000000000175
Jun  6 05:13:10 localhost kernel: [ 7975.657842] RBP: 00007fd12880c490 R08: 
0000000000000000 R09: 0000000781601e70
Jun  6 05:13:10 localhost kernel: [ 7975.657843] R10: 0000000000002288 R11: 
0000000000000293 R12: 00007fd158ccab40
Jun  6 05:13:10 localhost kernel: [ 7975.657844] R13: 00007fd0f000f1e8 R14: 
0000000000000042 R15: 00007fd12880c4a0
Jun  6 05:14:03 localhost kernel: [ 8028.649934] CIFS VFS: Server 
cifshost.example.com has not responded in 120 seconds. Reconnecting...
Jun  6 05:14:03 localhost kernel: [ 8028.652603] CIFS VFS: Free previous 
auth_key.response = 000000003e802799
Jun  6 05:15:11 localhost kernel: [ 8096.485698] INFO: task java:1672 blocked 
for more than 120 seconds.
Jun  6 05:15:11 localhost kernel: [ 8096.485721]       Not tainted 
4.15.0-22-generic #24-Ubuntu
Jun  6 05:15:11 localhost kernel: [ 8096.485737] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  6 05:15:11 localhost kernel: [ 8096.485758] java            D    0  1672   
   1 0x80000000
Jun  6 05:15:11 localhost kernel: [ 8096.485760] Call Trace:
Jun  6 05:15:11 localhost kernel: [ 8096.485769]  __schedule+0x297/0x8b0
Jun  6 05:15:11 localhost kernel: [ 8096.485772]  ? __wake_up+0x13/0x20
Jun  6 05:15:11 localhost kernel: [ 8096.485774]  schedule+0x2c/0x80
Jun  6 05:15:11 localhost kernel: [ 8096.485776]  io_schedule+0x16/0x40
Jun  6 05:15:11 localhost kernel: [ 8096.485779]  
wait_on_page_bit_common+0xd8/0x160
Jun  6 05:15:11 localhost kernel: [ 8096.485781]  ? 
page_cache_tree_insert+0xe0/0xe0
Jun  6 05:15:11 localhost kernel: [ 8096.485782]  
__filemap_fdatawait_range+0xfa/0x160
Jun  6 05:15:11 localhost kernel: [ 8096.485785]  
filemap_write_and_wait+0x4d/0x90
Jun  6 05:15:11 localhost kernel: [ 8096.485818]  cifs_flush+0x43/0x90 [cifs]
Jun  6 05:15:11 localhost kernel: [ 8096.485821]  filp_close+0x2f/0x80
Jun  6 05:15:11 localhost kernel: [ 8096.485823]  __close_fd+0x85/0xa0
Jun  6 05:15:11 localhost kernel: [ 8096.485824]  SyS_close+0x23/0x50
Jun  6 05:15:11 localhost kernel: [ 8096.485827]  do_syscall_64+0x73/0x130
Jun  6 05:15:11 localhost kernel: [ 8096.485829]  
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jun  6 05:15:11 localhost kernel: [ 8096.485830] RIP: 0033:0x7fd159dbd447
Jun  6 05:15:11 localhost kernel: [ 8096.485831] RSP: 002b:00007fd12880c440 
EFLAGS: 00000293 ORIG_RAX: 0000000000000003
Jun  6 05:15:11 localhost kernel: [ 8096.485832] RAX: ffffffffffffffda RBX: 
0000000000000175 RCX: 00007fd159dbd447
Jun  6 05:15:11 localhost kernel: [ 8096.485833] RDX: 0000000000000000 RSI: 
00000007c0023318 RDI: 0000000000000175
Jun  6 05:15:11 localhost kernel: [ 8096.485834] RBP: 00007fd12880c490 R08: 
0000000000000000 R09: 0000000781601e70
Jun  6 05:15:11 localhost kernel: [ 8096.485834] R10: 0000000000002288 R11: 
0000000000000293 R12: 00007fd158ccab40
Jun  6 05:15:11 localhost kernel: [ 8096.485835] R13: 00007fd0f000f1e8 R14: 
0000000000000042 R15: 00007fd12880c4a0
Jun  6 05:16:05 localhost kernel: [ 8150.501876] CIFS VFS: Server 
cifshost.example.com has not responded in 120 seconds. Reconnecting...
Jun  6 05:16:05 localhost kernel: [ 8150.504330] CIFS VFS: Free previous 
auth_key.response = 000000000551ece7
Jun  6 05:17:01 localhost CRON[2077]: (root) CMD (   cd / && run-parts --report 
/etc/cron.hourly)
Jun  6 05:17:12 localhost kernel: [ 8217.313824] INFO: task java:1672 blocked 
for more than 120 seconds.
Jun  6 05:17:12 localhost kernel: [ 8217.313849]       Not tainted 
4.15.0-22-generic #24-Ubuntu
Jun  6 05:17:12 localhost kernel: [ 8217.313864] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  6 05:17:12 localhost kernel: [ 8217.313884] java            D    0  1672   
   1 0x80000000
Jun  6 05:17:12 localhost kernel: [ 8217.313887] Call Trace:
Jun  6 05:17:12 localhost kernel: [ 8217.313894]  __schedule+0x297/0x8b0
Jun  6 05:17:12 localhost kernel: [ 8217.313897]  ? __wake_up+0x13/0x20
Jun  6 05:17:12 localhost kernel: [ 8217.313899]  schedule+0x2c/0x80
Jun  6 05:17:12 localhost kernel: [ 8217.313901]  io_schedule+0x16/0x40
Jun  6 05:17:12 localhost kernel: [ 8217.313904]  
wait_on_page_bit_common+0xd8/0x160
Jun  6 05:17:12 localhost kernel: [ 8217.313906]  ? 
page_cache_tree_insert+0xe0/0xe0
Jun  6 05:17:12 localhost kernel: [ 8217.313908]  
__filemap_fdatawait_range+0xfa/0x160
Jun  6 05:17:12 localhost kernel: [ 8217.313910]  
filemap_write_and_wait+0x4d/0x90
Jun  6 05:17:12 localhost kernel: [ 8217.313931]  cifs_flush+0x43/0x90 [cifs]
Jun  6 05:17:12 localhost kernel: [ 8217.313934]  filp_close+0x2f/0x80
Jun  6 05:17:12 localhost kernel: [ 8217.313936]  __close_fd+0x85/0xa0
Jun  6 05:17:12 localhost kernel: [ 8217.313938]  SyS_close+0x23/0x50
Jun  6 05:17:12 localhost kernel: [ 8217.313940]  do_syscall_64+0x73/0x130
Jun  6 05:17:12 localhost kernel: [ 8217.313941]  
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jun  6 05:17:12 localhost kernel: [ 8217.313943] RIP: 0033:0x7fd159dbd447
Jun  6 05:17:12 localhost kernel: [ 8217.313944] RSP: 002b:00007fd12880c440 
EFLAGS: 00000293 ORIG_RAX: 0000000000000003
Jun  6 05:17:12 localhost kernel: [ 8217.313945] RAX: ffffffffffffffda RBX: 
0000000000000175 RCX: 00007fd159dbd447
Jun  6 05:17:12 localhost kernel: [ 8217.313946] RDX: 0000000000000000 RSI: 
00000007c0023318 RDI: 0000000000000175
Jun  6 05:17:12 localhost kernel: [ 8217.313946] RBP: 00007fd12880c490 R08: 
0000000000000000 R09: 0000000781601e70
Jun  6 05:17:12 localhost kernel: [ 8217.313947] R10: 0000000000002288 R11: 
0000000000000293 R12: 00007fd158ccab40
Jun  6 05:17:12 localhost kernel: [ 8217.313948] R13: 00007fd0f000f1e8 R14: 
0000000000000042 R15: 00007fd12880c4a0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1729337

Title:
  CIFS errors on 4.4.0-98, but not on 4.4.0-97 with same config

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Trusty:
  In Progress
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Zesty:
  In Progress
Status in linux source package in Artful:
  Fix Committed

Bug description:
  == SRU Justification ==
  The bug reporter stated they have a cluster of servers that applied Xenial 
updates
  and then were unable to mount CIFS shares after upgrading to 4.4.0-98. The
  same machines on 4.4.0-97 do not hit the regression.  It was found that the
  regression is fixed by mainline commit:
  4587eee04e2a ("SMB3: Validate negotiate request must always be signed").

  This fix is required in all Ubuntu supported releases.  Commit 4587eee04e2a
  landed in mailine as of 4.14-rc7.  It  was also cc'd to upstream stable,
  but it has not landed in any stable releases yet, which is the reason for
  this SRU.


  == Fix ==
  commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd
  Author: Steve French <smfre...@gmail.com>
  Date:   Wed Oct 25 15:58:31 2017 -0500
      SMB3: Validate negotiate request must always be signed

  == Regression Potential ==
  This patch is to fix a regression.  It was also cc'd to upstream stable, so
  it received addition review upstream.

  We have a cluster of servers that applied a security update overnight
  and were unable to mount CIFS shares after upgrading to 4.4.0-98.  The
  same machines on 4.4.0-97 were fine the night before, and are fine
  after downgrading.  The only error message CIFS would report, even on
  verbose, was:

  [  257.089876] CIFS VFS: validate protocol negotiate failed: -11
  [  257.089964] CIFS VFS: cifs_mount failed w/return code = -5

  Rebooting did not help.  Nor did attempting to mount the share
  manually using mount -t cifs.

  Here's the offending line from our /etc/fstab (with hostnames
  sanitized):

  //server/share /mnt/share cifs 
rw,user,credentials=/etc/samba/credentials.share,uid=33,gid=33,file_mode=0770,dir_mode=0770
  ,exec,soft,noserverino,vers=3.0 0 0

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-98-generic 4.4.0-98.121
  ProcVersionSignature: Ubuntu 4.4.0-98.121-generic 4.4.90
  Uname: Linux 4.4.0-98-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Nov  1 07:56 seq
   crw-rw---- 1 root audio 116, 33 Nov  1 07:56 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu2.10
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Wed Nov  1 08:49:47 2017
  HibernationDevice: RESUME=/dev/mapper/ubuntu--template--vg-swap_1
  InstallationDate: Installed on 2016-12-16 (319 days ago)
  InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release amd64 
(20160719)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  MachineType: VMware, Inc. VMware Virtual Platform
  PciMultimedia:

  ProcFB: 0 svgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-98-generic 
root=/dev/mapper/ubuntu--template--vg-root ro
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-98-generic N/A
   linux-backports-modules-4.4.0-98-generic  N/A
   linux-firmware                            1.157.13
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 09/21/2015
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: 440BX Desktop Reference Platform
  dmi.board.vendor: Intel Corporation
  dmi.board.version: None
  dmi.chassis.asset.tag: No Asset Tag
  dmi.chassis.type: 1
  dmi.chassis.vendor: No Enclosure
  dmi.chassis.version: N/A
  dmi.modalias: 
dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd09/21/2015:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
  dmi.product.name: VMware Virtual Platform
  dmi.product.version: None
  dmi.sys.vendor: VMware, Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729337/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to