[ 
https://issues.apache.org/jira/browse/KAFKA-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357483#comment-16357483
 ] 

Yu Yang commented on KAFKA-6544:
--------------------------------

[~cmccabe]  The kafka process is in `<defunct>` status.  sudo ls -l 
/proc/$kafka_pid/fd returns 0.   I am also including  "netstat -pnt" output 
here. Connections are either in ESTABLISHED or CLOSE_WAIT status. 

{code}
proc/30413/fd]# sudo ls -l /proc/30413/fd
total 0
{code} 

{code}
netstat -pnt | grep "10.1.160.124:9092" | wc
    116     812   11252
{code} 


{code}
netstat -pnt | grep "10.1.160.124:9092"
tcp       29      0 10.1.160.124:9092       10.1.25.241:55616       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:58624       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.9.121:33894        CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:53886       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:43122       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:50766       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.26.165:34282       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.79.149:47682       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.163.135:44008      CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.66.116:52398       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.64.116:36656       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.207.247:51904      CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.9.16:45942         CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.131.15:57118       CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:55974       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.214.5:33040        CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:33494       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.201.139:60230      CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.207.247:51792      CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:42858       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:44246       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.194.26:42406       CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:32902       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.169.94:35532       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.193.101:48832      CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.204.225:60946      CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:35772       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:46972       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:56226       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:46432       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:44436       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:48888       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:47364       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:44908       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:43060       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.10.15:39282        CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.181.86:55500       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.17.191:32812       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.141.30:52024       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.76.141:51366       CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:50940       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.11.196:44064       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.143.107:37116      CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:37416       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.71.116:35110       CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:60884       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.14.163:51768       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.15.51:49542        CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.6.217:46520        CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:60314       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:56516       ESTABLISHED 
-               
tcp        0      0 10.1.160.124:9092       10.1.232.16:60754       SYN_RECV    
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:57568       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.198.209:38446      CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:38278       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.201.206:46686      CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:48798       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:51958       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:40716       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.0.41:47810         CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.215.172:34926      CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:36104       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.193.30:49338       CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:41596       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.201.28:57122       CLOSE_WAIT  
-               
tcp        0     12 10.1.160.124:9092       10.1.150.72:36506       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.165.50:43042       CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:50396       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.4.9:44952          CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.98.254:36852       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.247.162:38234      CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:38694       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:55794       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.138.76:56542       CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:40790       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:32858       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.77.228:34292       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.203.191:55610      CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:45182       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.3.215:58404        CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:42014       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:46172       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:39050       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:36000       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:51330       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:44994       ESTABLISHED 
-               
tcp        0      8 10.1.160.124:9092       10.1.150.72:46158       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:59280       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:46678       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:54272       ESTABLISHED 
-               
tcp        0     16 10.1.160.124:9092       10.1.63.47:56546        ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.79.66:34010        CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:56790       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:47846       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.229.18:34272       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.2.141:44584        CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:53156       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:52610       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:37628       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.203.117:41170      CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:42540       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:41244       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:56308       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:51810       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:38634       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.5.51:60498         CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.78.153:53942       CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:33506       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:43768       ESTABLISHED 
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:37134       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.69.17:45370        CLOSE_WAIT  
-               
tcp        1      0 10.1.160.124:9092       10.1.83.103:57640       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.205.66:40266       CLOSE_WAIT  
-               
tcp       29      0 10.1.160.124:9092       10.1.25.241:60058       ESTABLISHED 
-               
tcp       65      0 10.1.160.124:9092       10.1.245.20:54896       CLOSE_WAIT  
-               
tcp       65      0 10.1.160.124:9092       10.1.76.47:53444        CLOSE_WAIT  
-  
{code}

> kafka process should exit when it encounters "java.io.IOException: Too many 
> open files"  
> -----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6544
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6544
>             Project: Kafka
>          Issue Type: Bug
>          Components: admin, network
>    Affects Versions: 0.10.2.1
>            Reporter: Yu Yang
>            Priority: Major
>
> Our kafka cluster encountered a few disk/xfs failures in the cloud vm 
> instances. When a disk/xfs failure happens, kafka process did not exit 
> gracefully. Instead, it run into  "<defunct>" status, with port 9092 still be 
> reachable.  when failures like this happens, kafka should shutdown all 
> threads and exit. The following is the kafka logs when the failure happens:
> {code:java}
> [2018-02-08 12:52:31,764] ERROR Error while accepting connection 
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>         at kafka.network.Acceptor.accept(SocketServer.scala:340)
>         at kafka.network.Acceptor.run(SocketServer.scala:283)
>         at java.lang.Thread.run(Thread.java:748)
> [2018-02-08 12:52:31,772] ERROR Error while accepting connection 
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>         at kafka.network.Acceptor.accept(SocketServer.scala:340)
>         at kafka.network.Acceptor.run(SocketServer.scala:283)
>         at java.lang.Thread.run(Thread.java:748)
> [2018-02-08 12:52:31,772] ERROR Error while accepting connection 
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>         at kafka.network.Acceptor.accept(SocketServer.scala:340)
>         at kafka.network.Acceptor.run(SocketServer.scala:283)
>         at java.lang.Thread.run(Thread.java:748)
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to