https://issues.apache.org/bugzilla/show_bug.cgi?id=56518
Bug ID: 56518 Summary: NIO async servlet limit latch leak Product: Tomcat 7 Version: 7.0.53 Hardware: PC OS: Linux Status: NEW Severity: normal Priority: P2 Component: Connectors Assignee: dev@tomcat.apache.org Reporter: observer.h...@alibaba-inc.com Created attachment 31613 --> https://issues.apache.org/bugzilla/attachment.cgi?id=31613&action=edit the sample webapp to reproduce the bug we have encouter this bug in a real product webapp. I have tested this in linux x86, oracle jdk jdk1.7.0_55, tomcat 7.0.53 and tomcat 8.0.5. CONFIG: we change HTTP Connector to NIO in "server.xml", e.g. protocol="org.apache.coyote.http11.Http11NioProtocol" WEBAPP LOGIC: the simplified situation: 1. call "req.startAsync()" to start async serlvet, then execute the async logic in our user thread. 2. sometimes the user thread be interrupted (by some timeout logic of our code). 3. some user code call "resp.flushBuffer()" to send response to client PROBLEM: in the situation descibed above, the "LimitLatch.countDown()" is not called. when the connections limit latch count up to max ( default "10000" ), tomcat DO not accept any connection, all incoming client hangs. REPRODUCER: in a clean tomcat-7.0.53 installation: 1. change the default "server.xml" Connector config. (1) change protocol="org.apache.coyote.http11.Http11NioProtocol" (2) Optional, add maxConnections="100" to reproduce the bug faster. 2. copy the sample webapp in the attachment to "webapps/ROOT.war" 3. start tomcat. 4. make plenty request to "/async.html", for (( i = 0; i < 15000; ++i )) ; do echo $i; curl localhost:8080/async.html; done each request is likely cause a limit latch leak. when the requests reaches maxConnections (100 as we set above) or some more, the client ( curl ) hangs. TECHNIC-DETAILS: after some debug, wo found these: 1. when the thread was interrupted, when the user code call "resp.flushBuffer()", the NioChannel was Closed by jdk NIO code, and a ClosedByInterruptException is thrown. 2. when the channel closed, the SelectionKey was removed by Poller thread, stack trace: Daemon Thread [http-nio-8080-ClientPoller-0] (Suspended) owns: Object (id=3346) owns: HashSet<E> (id=3354) owns: EPollSelectorImpl (id=82) owns: Collections$UnmodifiableSet<E> (id=3355) owns: Util$2 (id=3356) SocketChannelImpl(AbstractSelectableChannel).removeKey(SelectionKey) line: 114 EPollSelectorImpl(AbstractSelector).deregister(AbstractSelectionKey) line: 168 EPollSelectorImpl.implDereg(SelectionKeyImpl) line: 162 EPollSelectorImpl(SelectorImpl).processDeregisterQueue() line: 131 EPollSelectorImpl.doSelect(long) line: 69 EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69 EPollSelectorImpl(SelectorImpl).select(long) line: 80 NioEndpoint$Poller.run() line: 1163 Thread.run() line: 662 3. when we call "ctx.complete()", it run to "org.apache.tomcat.util.net.NioEndpoint.processSocket(NioChannel, SocketStatus, boolean)", code is below: public boolean processSocket(NioChannel socket, SocketStatus status, boolean dispatch) { try { KeyAttachment attachment = (KeyAttachment)socket.getAttachment(false); if (attachment == null) { return false; } since the SelectionKey was removed, the "attachment" returns null. the logic is break, "AbstractEndpoint.countDownConnection()" is not called, a limit latch leak happens. WORK-AROUND: some work-around: 1. switch to the stable BIO connector. 2. avoid call "resp.flushBuffer()" in the user thread. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org