We have been hitting this bug quite often while running Tomcat 8.5 on
Amazon AWS Linux 2 with a kernel of 4.14.268-205.500.amzn2.x86_64

I wanted to see if the bug could be reproduced using an updated kernel
so I attempted to repro it using the server code and methodology
provided by Mark Thomas on Ubuntu Server 21.10 (running on a Raspberry
Pi 4 with 4GB RAM) and was NOT able to repro the bug (kernel
5.13.0-1008-raspi). I then installed Ubuntu Server 20.04 LTS on the same
machine and WAS able to repro the bug (kernel 5.4.0-1052-raspi).  The
bug was fairly easy to repro and did not take multiple times to repro.

Since then I have been able to repro the bug using the server code on
AWS Linux 2 with the 4.14.268-205.500.amzn2.x86_64 kernel, but not on
AWS Linux 2 with a 5.10.109-104.500.amzn2.x86_64 kernel.

I think there is a slight problem with the server code used in the
repro, as it is calling `pthread_create` with no thread attributes,
which will create joinable threads instead of detached threads. The
documentation for `pthread_create` says that "Only when a terminated
joinable thread has been joined are the last of its resources released
back to the system." Because the server code never joins the threads I
think this is preventing the OS from releasing the thread resources.
This results in the server eventually running out of memory and the
server program returning a "pthread_create: Cannot allocate memory" as
mentioned by Brooke Hedrick in their comment.  I was also not able to
repro the bug on WSL (kernel 4.4.0-19041-Microsoft), but perhaps their
underlying network drivers are different?

I also was running into this issue when running the server code. I made
a slight modification to the server code to set the pthread attribute to
create the new threads in a detached state. This seemed to solve the
memory issue and I was able to repro the bug with this server.  I've
attached the code.

Additionally, I found it useful to use `prlimit` to update the maximum
number of open files for the server process, once it was running. This
made the server less likely to run into an EMFILE error when calling
`accept`.


** Attachment added: "Updated server to demonstrate bug"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1924298/+attachment/5582247/+files/server.c

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1924298

Title:
  accept returns duplicate endpoints under load

Status in Linux:
  New
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  When accepting client connections under load, duplicate endpoints may
  be returned. These endpoints will have different (usually sequential)
  file descriptors but will refer to the same connection (same server
  IP, same server port, same client IP, same client port). Both copies
  of the endpoint appear to be functional.

  Reproduction requires:
  - compilation of the attached server.c program
  - wrk (https://github.com/wg/wrk) to generate load

  The steps to reproduce are:
  - run 'server' application in one console window
  - run 'for i in {1..50}; do /opt/wrk/wrk -t 2 -c 1000 -d 5s --latency 
--timeout 1s http://localhost:5555/post; done' in a second console window
  - run the same command in a third window to generate concurrent load

  You may need to run additional instance of the wrk command in multiple
  windows to trigger the issue.

  When the problem occurs the server executable will exit and print some 
debugging info. e.g.:
  accerror = 1950892, counter = 10683, port = 59892, clientfd = 233, lastClient 
= 232

  This indicates that the sockets with file descriptors 233 and 232 are
  duplicates.

  The issue has been reproduced on fully patched versions of Ubuntu
  20.04 and 18.04. Other versions have not been tested.

  This issue was originally observed in Java and was reported against the 
Spring Framework:
  https://github.com/spring-projects/spring-framework/issues/26434

  Investigation from the Spring team and the Apache Tomcat team identified that 
it appeared to be a JDK issue:
  https://bugs.openjdk.java.net/browse/JDK-8263243

  Further research from the JDK team determined that the issue was at
  the OS level. Hence this report.

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: linux-image-5.4.0-71-generic 5.4.0-71.79
  ProcVersionSignature: Ubuntu 5.4.0-71.79-generic 5.4.101
  Uname: Linux 5.4.0-71-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  ApportVersion: 2.20.11-0ubuntu27.16
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Thu Apr 15 12:52:53 2021
  HibernationDevice: RESUME=UUID=f5a46e09-d99b-4475-8ab6-2cd70da8418d
  InstallationDate: Installed on 2017-02-02 (1532 days ago)
  InstallationMedia: Ubuntu 16.04.1 LTS "Xenial Xerus" - Release amd64 
(20160719)
  IwConfig:
   lo        no wireless extensions.
   
   docker0   no wireless extensions.
   
   eno1      no wireless extensions.
  MachineType: Gigabyte Technology Co., Ltd. Default string
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-71-generic 
root=/dev/mapper/ubuntu--vg-root ro text
  RelatedPackageVersions:
   linux-restricted-modules-5.4.0-71-generic N/A
   linux-backports-modules-5.4.0-71-generic  N/A
   linux-firmware                            1.187.10
  RfKill:
   
  SourcePackage: linux
  UpgradeStatus: Upgraded to focal on 2020-09-07 (219 days ago)
  dmi.bios.date: 06/13/2016
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: F22
  dmi.board.asset.tag: Default string
  dmi.board.name: X99-SLI-CF
  dmi.board.vendor: Gigabyte Technology Co., Ltd.
  dmi.board.version: x.x
  dmi.chassis.asset.tag: Default string
  dmi.chassis.type: 3
  dmi.chassis.vendor: Default string
  dmi.chassis.version: Default string
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvrF22:bd06/13/2016:svnGigabyteTechnologyCo.,Ltd.:pnDefaultstring:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnX99-SLI-CF:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:
  dmi.product.family: Default string
  dmi.product.name: Default string
  dmi.product.sku: Default string
  dmi.product.version: Default string
  dmi.sys.vendor: Gigabyte Technology Co., Ltd.

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1924298/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to