Public bug reported:

We are observing a massive uptick in metadata requests to an NFSv4.1
(AWS EFS) network drive that is linked to a one or more web servers.
This started happening about a week ago across a number of stacks on
AWS. This issue ended up exhausting all the burst credits we had accrued
and caused a production outage.

Diagnostics...

`nfsiostat` shows 50-600 ops/s on the servers where it is happening.

`nfsstat -m` shows these mount flags: `Flags:
rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y`

`nfstrace --mode=live --verbose=2` shows that the same operations are
happening repeatedly on 2-5 filehandles;

```
   CALL  [ operations: 4 tag: void minor version: 1
       [ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 
sequenceid: 0xfc12c slotid: 2 cache this: 0 ]
       [ PUTFH(22) [ object: 
839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08
 ]
       [ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE  ]
       [ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ]  
]
   REPLY [  operations: 4 status: OK tag: void
       [ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 
sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 
status flags: 0 ]
       [ PUTFH(22) [ status: OK ]
       [ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE  
access: READ LOOKUP MODIFY EXTEND DELETE  ]
       [ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA 
TIME_MODIFY ]  ]
```

`lsof -N` shows nothing using files from the NFS drive.

I tried changing some of the mount options to reduce the amount of
metadata requests going through to the NFS server and found that
`lazytime` did help - observations are as below;

    Server 1:
        checked mount options - using relatime
        checked nfs stats - 460.982 ops/s
        updated mount options
        unmounted and mounted the NFS drive
        checked nfs stats - 1.400 ops/s
        rebooted
        checked nfs stats - 1.316 ops/s
    Server 2:
        checked mount options - using relatime
        checked nfs stats - 390.998 ops/s
        did not updated mount options
        unmounted and mounted the NFS drive
        checked nfs stats - 1.750 ops/s
        rebooted
        checked nfs stats - 531.932 ops/s

Conclusion from this is that;

- Unmounting and remounting the disk stops the issue temporarily
- `lazytime` option seems to prevent the issue from happening
- Issue may be boot related

I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet
of the output is below;

```
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: 
nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: 
permission(0:55/1440391441734181492), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: 
nfs_lookup_revalidate_done(/Components) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare 
data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot 
used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot 
used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: 
sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 
cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change 
attribute=2153
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file 
size=71680
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: 
fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: 
fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file 
mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: 
rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: 
space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: 
atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: 
ctime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: 
mtime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] 
decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr 
returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: 
xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot 
used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot 
used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 
highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: 
Error 0 free the slot
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 
highest_used_slotid 4294967295
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: 
nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: 
permission(0:55/10743289300410127521), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: 
nfs_lookup_revalidate_done(Components/Image.png) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: 
dentry_delete(Components/Image.png, 48084c)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: 
permission(0:55/1440391441734181492), mask=0x81, res=-10
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare 
data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot 
used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot 
used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: 
sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 
cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change 
attribute=1626
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file 
size=38912
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: 
fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: 
fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file 
mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: 
rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: 
space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: 
atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: 
ctime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: 
mtime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] 
decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr 
returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: 
xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot 
used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot 
used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 
highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: 
Error 0 free the slot
```

This loops repeatedly at a rate roughly consistent with ops/s output
from `nfsiostat`.

I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting
and the issue no longer occurs. I see the issue again once rolling
forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the
web server loads anything from the NFS mount.

Output from `ubuntu-bug linux`

```
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
Date: Fri Mar 17 16:29:14 2023
Dependencies:
 adduser 3.116ubuntu1
 apt 1.6.14
 apt-utils 1.6.14
 busybox-initramfs 1:1.27.2-2ubuntu3.4
 ca-certificates 20211016ubuntu0.18.04.1
 coreutils 8.28-1ubuntu1
 cpio 2.12+dfsg-6ubuntu0.18.04.4
 debconf 1.5.66ubuntu1
 debconf-i18n 1.5.66ubuntu1
 dmsetup 2:1.02.145-4.1ubuntu3.18.04.3
 dpkg 1.19.0.5ubuntu2.4
 fdisk 2.31.1-0.4ubuntu3.7
 gcc-8-base 8.4.0-1ubuntu1~18.04
 gettext-base 0.19.8.1-6ubuntu0.3
 gpgv 2.2.4-1ubuntu1.6
 grub-common 2.02-2ubuntu8.25
 grub-gfxpayload-lists 0.7
 grub-pc 2.02-2ubuntu8.25
 grub-pc-bin 2.02-2ubuntu8.25
 grub2-common 2.02-2ubuntu8.25
 init-system-helpers 1.51
 initramfs-tools 0.130ubuntu3.13
 initramfs-tools-bin 0.130ubuntu3.13
 initramfs-tools-core 0.130ubuntu3.13
 klibc-utils 2.0.4-9ubuntu2.2
 kmod 24-1ubuntu3.5
 libacl1 2.2.52-3build1
 libapt-inst2.0 1.6.14
 libapt-pkg5.0 1.6.14
 libattr1 1:2.4.47-2build1
 libaudit-common 1:2.8.2-1ubuntu1.1
 libaudit1 1:2.8.2-1ubuntu1.1
 libblkid1 2.31.1-0.4ubuntu3.7
 libbz2-1.0 1.0.6-8.1ubuntu0.2
 libc6 2.27-3ubuntu1.6
 libcap-ng0 0.7.7-3.1
 libdb5.3 5.3.28-13.1ubuntu1.1
 libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3
 libfdisk1 2.31.1-0.4ubuntu3.7
 libffi6 3.2.1-8
 libfreetype6 2.8.1-2ubuntu2.2
 libfuse2 2.9.7-1ubuntu1
 libgcc1 1:8.4.0-1ubuntu1~18.04
 libgcrypt20 1.8.1-4ubuntu1.3
 libgmp10 2:6.1.2+dfsg-2ubuntu0.1
 libgnutls30 3.5.18-1ubuntu1.6
 libgpg-error0 1.27-6
 libgpm2 1.20.7-5
 libhogweed4 3.4.1-0ubuntu0.18.04.1
 libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]
 libklibc 2.0.4-9ubuntu2.2
 libkmod2 24-1ubuntu3.5
 liblocale-gettext-perl 1.07-3build2
 liblz4-1 0.0~r131-2ubuntu3.1
 liblzma5 5.2.2-1.3ubuntu0.1
 libmount1 2.31.1-0.4ubuntu3.7
 libncurses5 6.1-1ubuntu1.18.04
 libncursesw5 6.1-1ubuntu1.18.04
 libnettle6 3.4.1-0ubuntu0.18.04.1
 libp11-kit0 0.23.9-2ubuntu0.1
 libpam-modules 1.1.8-3.6ubuntu2.18.04.6
 libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6
 libpam0g 1.1.8-3.6ubuntu2.18.04.6
 libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]
 libpng16-16 1.6.34-1ubuntu0.18.04.2
 libprocps6 2:3.3.12-3ubuntu1.2
 libseccomp2 2.5.1-1ubuntu1~18.04.2
 libselinux1 2.7-2build2
 libsemanage-common 2.7-2build2
 libsemanage1 2.7-2build2
 libsepol1 2.7-1ubuntu0.1
 libsmartcols1 2.31.1-0.4ubuntu3.7
 libssl1.1 1.1.1-1ubuntu2.1~18.04.21
 libstdc++6 8.4.0-1ubuntu1~18.04
 libsystemd0 237-3ubuntu10.57
 libtasn1-6 4.13-2
 libtext-charwidth-perl 0.04-7.1
 libtext-iconv-perl 1.7-5build6
 libtext-wrapi18n-perl 0.06-7.1
 libtinfo5 6.1-1ubuntu1.18.04
 libudev1 237-3ubuntu10.57
 libunistring2 0.9.9-0ubuntu2
 libuuid1 2.31.1-0.4ubuntu3.7
 libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php]
 linux-base 4.5ubuntu1.7
 linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
 lsb-base 9.20170808ubuntu1
 openssl 1.1.1-1ubuntu2.1~18.04.21
 os-prober 1.74ubuntu1
 passwd 1:4.5-1ubuntu2.5
 perl-base 5.26.1-6ubuntu0.6
 procps 2:3.3.12-3ubuntu1.2
 psmisc 23.1-1ubuntu0.1
 sensible-utils 0.0.12
 tar 1.29b-2ubuntu0.4
 ubuntu-keyring 2018.09.18.1~18.04.2
 ucf 3.0038
 udev 237-3ubuntu10.57
 util-linux 2.31.1-0.4ubuntu3.7
 uuid-runtime 2.31.1-0.4ubuntu3.7
 zlib1g 1:1.2.11.dfsg-0ubuntu2.2
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0479a9a73c32d2448
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
PackageArchitecture: amd64
ProcCpuinfoMinimal:
 processor      : 1
 vendor_id      : GenuineIntel
 cpu family     : 6
 model          : 79
 model name     : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 stepping       : 1
 microcode      : 0xb000040
 cpu MHz                : 2299.922
 cache size     : 46080 KB
 physical id    : 0
 siblings       : 2
 core id                : 1
 cpu cores      : 2
 apicid         : 2
 initial apicid : 2
 fpu            : yes
 fpu_exception  : yes
 cpuid level    : 13
 wp             : yes
 flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc 
rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 
x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor 
lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms 
invpcid xsaveopt
 bugs           : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds 
swapgs itlb_multihit mmio_stale_data
 bogomips       : 4600.03
 clflush size   : 64
 cache_alignment        : 64
 address sizes  : 46 bits physical, 48 bits virtual
 power management:
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
SourcePackage: linux-signed-aws-5.4
Tags: third-party-packages bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True
```

We also need:

1) The release of Ubuntu you are using, via 'lsb_release -rd' or System
-> About Ubuntu

Description:    Ubuntu 18.04.6 LTS
Release:        18.04

2) The version of the package you are using, via 'apt-cache policy
pkgname' or by checking in Software Center

linux-aws-5.4-tools-5.4.0-1097:
  Installed: (none)
  Candidate: 5.4.0-1097.105~18.04.1
  Version table:
     5.4.0-1097.105~18.04.1 500
        500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu 
bionic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 
Packages

3) What you expected to happen

The OS to not hose the NFS server with metadata requests until something
breaks

4) What happened instead

The OS hosed the NFS server with metadata requests until it caused a
production outage

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2012020

Title:
  Release 5.4.0-1097 Floods NFS server with 50-600 metadata requests per
  second per server

Status in linux package in Ubuntu:
  New

Bug description:
  We are observing a massive uptick in metadata requests to an NFSv4.1
  (AWS EFS) network drive that is linked to a one or more web servers.
  This started happening about a week ago across a number of stacks on
  AWS. This issue ended up exhausting all the burst credits we had
  accrued and caused a production outage.

  Diagnostics...

  `nfsiostat` shows 50-600 ops/s on the servers where it is happening.

  `nfsstat -m` shows these mount flags: `Flags:
  
rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y`

  `nfstrace --mode=live --verbose=2` shows that the same operations are
  happening repeatedly on 2-5 filehandles;

  ```
     CALL  [ operations: 4 tag: void minor version: 1
         [ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 
sequenceid: 0xfc12c slotid: 2 cache this: 0 ]
         [ PUTFH(22) [ object: 
839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08
 ]
         [ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE  ]
         [ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY 
]  ]
     REPLY [  operations: 4 status: OK tag: void
         [ SEQUENCE(53) [ status: OK session: 
0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest 
slotid: 63 target highest slotid: 63 status flags: 0 ]
         [ PUTFH(22) [ status: OK ]
         [ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE  
access: READ LOOKUP MODIFY EXTEND DELETE  ]
         [ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA 
TIME_MODIFY ]  ]
  ```

  `lsof -N` shows nothing using files from the NFS drive.

  I tried changing some of the mount options to reduce the amount of
  metadata requests going through to the NFS server and found that
  `lazytime` did help - observations are as below;

      Server 1:
          checked mount options - using relatime
          checked nfs stats - 460.982 ops/s
          updated mount options
          unmounted and mounted the NFS drive
          checked nfs stats - 1.400 ops/s
          rebooted
          checked nfs stats - 1.316 ops/s
      Server 2:
          checked mount options - using relatime
          checked nfs stats - 390.998 ops/s
          did not updated mount options
          unmounted and mounted the NFS drive
          checked nfs stats - 1.750 ops/s
          rebooted
          checked nfs stats - 531.932 ops/s

  Conclusion from this is that;

  - Unmounting and remounting the disk stops the issue temporarily
  - `lazytime` option seems to prevent the issue from happening
  - Issue may be boot related

  I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet
  of the output is below;

  ```
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: 
nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040)
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: 
permission(0:55/1440391441734181492), mask=0x1, res=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: 
nfs_lookup_revalidate_done(/Components) is valid
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> 
nfs41_call_sync_prepare data->seq_server 000000008a198bb4
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot 
used_slots=0000 highest_used=4294967295 max_slots=64
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot 
used_slots=0001 highest_used=0 slotid=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: 
sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 
cache_this=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: 
change attribute=2153
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file 
size=71680
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: 
fsid=(0x0/0x0)
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: 
fileid=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: 
fs_locations done, error = 0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file 
mode=00
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: 
rdev=(0x0:0x0)
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: 
space used=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: 
atime=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] 
decode_attr_time_metadata: ctime=1677729393
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: 
mtime=1677729393
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] 
decode_attr_mounted_on_fileid: fileid=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: 
xdr returned 0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: 
xdr returned 0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot 
used_slots=0001 highest_used=0 max_slots=64
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot 
used_slots=0003 highest_used=1 slotid=1
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 
highest_used_slotid 0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: 
Error 0 free the slot
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 
highest_used_slotid 4294967295
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: 
nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040)
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: 
permission(0:55/10743289300410127521), mask=0x1, res=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: 
nfs_lookup_revalidate_done(Components/Image.png) is valid
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: 
dentry_delete(Components/Image.png, 48084c)
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: 
permission(0:55/1440391441734181492), mask=0x81, res=-10
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> 
nfs41_call_sync_prepare data->seq_server 000000008a198bb4
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot 
used_slots=0000 highest_used=4294967295 max_slots=64
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot 
used_slots=0001 highest_used=0 slotid=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: 
sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 
cache_this=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: 
change attribute=1626
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file 
size=38912
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: 
fsid=(0x0/0x0)
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: 
fileid=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: 
fs_locations done, error = 0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file 
mode=00
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: 
rdev=(0x0:0x0)
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: 
space used=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: 
atime=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] 
decode_attr_time_metadata: ctime=1678928064
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: 
mtime=1678928064
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] 
decode_attr_mounted_on_fileid: fileid=0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: 
xdr returned 0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: 
xdr returned 0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot 
used_slots=0001 highest_used=0 max_slots=64
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot 
used_slots=0003 highest_used=1 slotid=1
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 
highest_used_slotid 0
  Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: 
Error 0 free the slot
  ```

  This loops repeatedly at a rate roughly consistent with ops/s output
  from `nfsiostat`.

  I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting
  and the issue no longer occurs. I see the issue again once rolling
  forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as
  the web server loads anything from the NFS mount.

  Output from `ubuntu-bug linux`

  ```
  ProblemType: Bug
  ApportVersion: 2.20.9-0ubuntu7.28
  Architecture: amd64
  Date: Fri Mar 17 16:29:14 2023
  Dependencies:
   adduser 3.116ubuntu1
   apt 1.6.14
   apt-utils 1.6.14
   busybox-initramfs 1:1.27.2-2ubuntu3.4
   ca-certificates 20211016ubuntu0.18.04.1
   coreutils 8.28-1ubuntu1
   cpio 2.12+dfsg-6ubuntu0.18.04.4
   debconf 1.5.66ubuntu1
   debconf-i18n 1.5.66ubuntu1
   dmsetup 2:1.02.145-4.1ubuntu3.18.04.3
   dpkg 1.19.0.5ubuntu2.4
   fdisk 2.31.1-0.4ubuntu3.7
   gcc-8-base 8.4.0-1ubuntu1~18.04
   gettext-base 0.19.8.1-6ubuntu0.3
   gpgv 2.2.4-1ubuntu1.6
   grub-common 2.02-2ubuntu8.25
   grub-gfxpayload-lists 0.7
   grub-pc 2.02-2ubuntu8.25
   grub-pc-bin 2.02-2ubuntu8.25
   grub2-common 2.02-2ubuntu8.25
   init-system-helpers 1.51
   initramfs-tools 0.130ubuntu3.13
   initramfs-tools-bin 0.130ubuntu3.13
   initramfs-tools-core 0.130ubuntu3.13
   klibc-utils 2.0.4-9ubuntu2.2
   kmod 24-1ubuntu3.5
   libacl1 2.2.52-3build1
   libapt-inst2.0 1.6.14
   libapt-pkg5.0 1.6.14
   libattr1 1:2.4.47-2build1
   libaudit-common 1:2.8.2-1ubuntu1.1
   libaudit1 1:2.8.2-1ubuntu1.1
   libblkid1 2.31.1-0.4ubuntu3.7
   libbz2-1.0 1.0.6-8.1ubuntu0.2
   libc6 2.27-3ubuntu1.6
   libcap-ng0 0.7.7-3.1
   libdb5.3 5.3.28-13.1ubuntu1.1
   libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3
   libfdisk1 2.31.1-0.4ubuntu3.7
   libffi6 3.2.1-8
   libfreetype6 2.8.1-2ubuntu2.2
   libfuse2 2.9.7-1ubuntu1
   libgcc1 1:8.4.0-1ubuntu1~18.04
   libgcrypt20 1.8.1-4ubuntu1.3
   libgmp10 2:6.1.2+dfsg-2ubuntu0.1
   libgnutls30 3.5.18-1ubuntu1.6
   libgpg-error0 1.27-6
   libgpm2 1.20.7-5
   libhogweed4 3.4.1-0ubuntu0.18.04.1
   libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]
   libklibc 2.0.4-9ubuntu2.2
   libkmod2 24-1ubuntu3.5
   liblocale-gettext-perl 1.07-3build2
   liblz4-1 0.0~r131-2ubuntu3.1
   liblzma5 5.2.2-1.3ubuntu0.1
   libmount1 2.31.1-0.4ubuntu3.7
   libncurses5 6.1-1ubuntu1.18.04
   libncursesw5 6.1-1ubuntu1.18.04
   libnettle6 3.4.1-0ubuntu0.18.04.1
   libp11-kit0 0.23.9-2ubuntu0.1
   libpam-modules 1.1.8-3.6ubuntu2.18.04.6
   libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6
   libpam0g 1.1.8-3.6ubuntu2.18.04.6
   libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]
   libpng16-16 1.6.34-1ubuntu0.18.04.2
   libprocps6 2:3.3.12-3ubuntu1.2
   libseccomp2 2.5.1-1ubuntu1~18.04.2
   libselinux1 2.7-2build2
   libsemanage-common 2.7-2build2
   libsemanage1 2.7-2build2
   libsepol1 2.7-1ubuntu0.1
   libsmartcols1 2.31.1-0.4ubuntu3.7
   libssl1.1 1.1.1-1ubuntu2.1~18.04.21
   libstdc++6 8.4.0-1ubuntu1~18.04
   libsystemd0 237-3ubuntu10.57
   libtasn1-6 4.13-2
   libtext-charwidth-perl 0.04-7.1
   libtext-iconv-perl 1.7-5build6
   libtext-wrapi18n-perl 0.06-7.1
   libtinfo5 6.1-1ubuntu1.18.04
   libudev1 237-3ubuntu10.57
   libunistring2 0.9.9-0ubuntu2
   libuuid1 2.31.1-0.4ubuntu3.7
   libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: 
LP-PPA-ondrej-php]
   linux-base 4.5ubuntu1.7
   linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
   lsb-base 9.20170808ubuntu1
   openssl 1.1.1-1ubuntu2.1~18.04.21
   os-prober 1.74ubuntu1
   passwd 1:4.5-1ubuntu2.5
   perl-base 5.26.1-6ubuntu0.6
   procps 2:3.3.12-3ubuntu1.2
   psmisc 23.1-1ubuntu0.1
   sensible-utils 0.0.12
   tar 1.29b-2ubuntu0.4
   ubuntu-keyring 2018.09.18.1~18.04.2
   ucf 3.0038
   udev 237-3ubuntu10.57
   util-linux 2.31.1-0.4ubuntu3.7
   uuid-runtime 2.31.1-0.4ubuntu3.7
   zlib1g 1:1.2.11.dfsg-0ubuntu2.2
  DistroRelease: Ubuntu 18.04
  Ec2AMI: ami-0479a9a73c32d2448
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: ap-southeast-2c
  Ec2InstanceType: t2.medium
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
  PackageArchitecture: amd64
  ProcCpuinfoMinimal:
   processor    : 1
   vendor_id    : GenuineIntel
   cpu family   : 6
   model                : 79
   model name   : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
   stepping     : 1
   microcode    : 0xb000040
   cpu MHz              : 2299.922
   cache size   : 46080 KB
   physical id  : 0
   siblings     : 2
   core id              : 1
   cpu cores    : 2
   apicid               : 2
   initial apicid       : 2
   fpu          : yes
   fpu_exception        : yes
   cpuid level  : 13
   wp           : yes
   flags                : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm 
constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid 
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand 
hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep 
bmi2 erms invpcid xsaveopt
   bugs         : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds 
swapgs itlb_multihit mmio_stale_data
   bogomips     : 4600.03
   clflush size : 64
   cache_alignment      : 64
   address sizes        : 46 bits physical, 48 bits virtual
   power management:
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
  SourcePackage: linux-signed-aws-5.4
  Tags: third-party-packages bionic ec2-images
  Uname: Linux 5.4.0-1097-aws x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  _MarkForUpload: True
  ```

  We also need:

  1) The release of Ubuntu you are using, via 'lsb_release -rd' or
  System -> About Ubuntu

  Description:  Ubuntu 18.04.6 LTS
  Release:      18.04

  2) The version of the package you are using, via 'apt-cache policy
  pkgname' or by checking in Software Center

  linux-aws-5.4-tools-5.4.0-1097:
    Installed: (none)
    Candidate: 5.4.0-1097.105~18.04.1
    Version table:
       5.4.0-1097.105~18.04.1 500
          500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu 
bionic-updates/main amd64 Packages
          500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 
Packages

  3) What you expected to happen

  The OS to not hose the NFS server with metadata requests until
  something breaks

  4) What happened instead

  The OS hosed the NFS server with metadata requests until it caused a
  production outage

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2012020/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to