On 04/04/2017 09:02 AM, Salih Sen wrote:
Hi,One of the replicas went down again today somehow disabling all updates to cluster with error message "Cannot talk to ZooKeeper - Updates are disabled.” half an hour.ZK Leader was on the same server with Solr instance so I doubt it has anything to do with network (at least between Solr and ZK leader node), restarting the ZK leader seems to resolve the issue and cluster accepting updates again.== Solr NodeWARN - 2017-04-04 11:49:14.414; [ ] org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name: ZooKeeperConnection Watcher:192.168.30.32:2181 <http://192.168.30.32:2181>,192.168.30.33:2181 <http://192.168.30.33:2181>,192.168.30.24:2181 <http://192.168.30.24:2181> got event WatchedEvent state:Disconnected type:None path:null path: null type: None WARN - 2017-04-04 11:49:15.723; [ ] org.apache.solr.common.cloud.ConnectionManager; zkClient has disconnected WARN - 2017-04-04 11:49:15.727; [ ] org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name: ZooKeeperConnection Watcher:192.168.30.32:2181 <http://192.168.30.32:2181>,192.168.30.33:2181 <http://192.168.30.33:2181>,192.168.30.24:2181 <http://192.168.30.24:2181> got event WatchedEvent state:Expired type:None path:null path: null type: None WARN - 2017-04-04 11:49:15.727; [ ] org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper... WARN - 2017-04-04 11:49:15.728; [ ] org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection expired - starting a new one... ERROR - 2017-04-04 11:49:22.040; [c:doc s:shard6 r:core_node27 x:doc_shard6_replica1] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1739) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:703) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135) at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)at org.eclipse.jetty.server.Server.handle(Server.java:534)ERROR - 2017-04-04 11:50:13.798; [ ] org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Error trying to proxy request for url: http://192.168.30.24:9141/solr/doc/select at org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:659) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:513) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)at org.eclipse.jetty.server.Server.handle(Server.java:534)at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)at java.lang.Thread.run(Thread.java:745) Caused by: org.eclipse.jetty.io.EofExceptionat org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:199)at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:420) at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:313)at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:140) at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:744) at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224) at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:518) at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:724) at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:775)at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:235) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:219) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:496) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2147) at org.apache.commons.io.IOUtils.copy(IOUtils.java:2102) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2123)at org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:655)=== ZK Leader2017-04-04 14:48:46,327 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197>] - Accepted socket connection from /192.168.30.24:57990 <http://192.168.30.24:57990> 2017-04-04 14:48:46,499 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197>] - Accepted socket connection from /192.168.30.24:57994 <http://192.168.30.24:57994> 2017-04-04 14:48:50,005 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x15b14ba8a8e0054, timeout of 40000ms exceeded 2017-04-04 14:48:50,005 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x15b14ba8a8e0054 2017-04-04 14:48:59,821 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861 <http://0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861>] - Client attempting to renew session 0x15b14ba8a8e004b at /192.168.30.24:57990 <http://192.168.30.24:57990> 2017-04-04 14:48:59,822 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617 <http://0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617>] - Established session 0x15b14ba8a8e004b with negotiated timeout 40000 for client /192.168.30.24:57990 <http://192.168.30.24:57990> 2017-04-04 14:48:59,822 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357>] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15b14ba8a8e004b, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)at java.lang.Thread.run(Thread.java:745)2017-04-04 14:48:59,827 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007>] - Closed socket connection for client /192.168.30.24:57990 <http://192.168.30.24:57990> which had sessionid 0x15b14ba8a8e004b 2017-04-04 14:48:59,827 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861 <http://0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861>] - Client attempting to renew session 0x15b14ba8a8e0050 at /192.168.30.24:57994 <http://192.168.30.24:57994> 2017-04-04 14:48:59,827 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617 <http://0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617>] - Established session 0x15b14ba8a8e0050 with negotiated timeout 40000 for client /192.168.30.24:57994 <http://192.168.30.24:57994> 2017-04-04 14:49:17,455 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197>] - Accepted socket connection from /192.168.30.24:58082 <http://192.168.30.24:58082> 2017-04-04 14:49:17,667 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197>] - Accepted socket connection from /192.168.30.32:56600 <http://192.168.30.32:56600> 2017-04-04 14:49:17,667 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861 <http://0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861>] - Client attempting to renew session 0x15b14ba8a8e0043 at /192.168.30.32:56600 <http://192.168.30.32:56600> 2017-04-04 14:49:17,681 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617 <http://0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617>] - Established session 0x15b14ba8a8e0043 with negotiated timeout 40000 for client /192.168.30.32:56600 <http://192.168.30.32:56600> 2017-04-04 14:49:22,040 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868 <http://0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868>] - Client attempting to establish new session at /192.168.30.24:58082 <http://192.168.30.24:58082> 2017-04-04 14:49:22,051 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x35ad61c452c00d3 with negotiated timeout 40000 for client /192.168.30.24:58082 <http://192.168.30.24:58082> 2017-04-04 14:49:28,659 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x35ad61c452c00d3 type:delete cxid:0xe zxid:0x700004c25 txntype:-1 reqpath:n/a Error Path:/overseer_elect/election/97694608339632212-192.168.30.24:9133_solr-n_0000000380 Error:KeeperErrorCode = NoNode for /overseer_elect/election/97694608339632212-192.168.30.24:9133_solr-n_0000000380 2017-04-04 14:49:28,675 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x35ad61c452c00d3 type:create cxid:0x13 zxid:0x700004c27 txntype:-1 reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists for /overseer== ZK Follower 12017-04-04 14:48:45,570 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357>] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15b14ba8a8e004b, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)at java.lang.Thread.run(Thread.java:745)2017-04-04 14:48:45,587 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007>] - Closed socket connection for client /192.168.30.24:39820 <http://192.168.30.24:39820> which had sessionid 0x15b14ba8a8e004b 2017-04-04 14:48:45,587 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357>] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15b14ba8a8e0050, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)at java.lang.Thread.run(Thread.java:745)2017-04-04 14:48:45,589 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007>] - Closed socket connection for client /192.168.30.24:40132 <http://192.168.30.24:40132> which had sessionid 0x15b14ba8a8e0050 2017-04-04 14:48:48,351 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357>] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15b14ba8a8e0054, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)at java.lang.Thread.run(Thread.java:745)2017-04-04 14:48:48,352 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007>] - Closed socket connection for client /192.168.30.24:40212 <http://192.168.30.24:40212> which had sessionid 0x15b14ba8a8e0054 2017-04-04 15:24:03,034 [myid:1] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken for id 3, my id = 1, error =java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765) 2017-04-04 15:24:03,053 [myid:1] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker 2017-04-04 15:24:03,093 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leaderjava.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)*Salih Şen* M: +90 533 131 17 07 E: sa...@dilisim.com <mailto:sa...@dilisim.com> W: www.dilisim.com <http://www.dilisim.com> Skype: slhsenOn 4 April 2017 at 10:36:14, Salih Sen (sa...@dilisim.com <mailto:sa...@dilisim.com>) wrote:Hi,Sorry for the initial hurried up mail, here is some correction and further explanation:Problem I described previously was happening before we set zkClientTimeout value so it was 30000 when it happened.autoCommit maxTime value is 15000 and autoSoftCommit maxTime is 60000.We recently removed maxDocs values from autoCommit settings and it seems more stable so far and has better response time.I can’t seem to find these values on Solr logs probably because logging level is currently WARN but we left those as default so I think they’re set as the values in solr.xml<int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int><int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>We have 12 replicas using default routing. All commits and queries are going to a single node because of the dummy client we use. Documents are send in JSON format. I don’t have exact knowledge of document size, they are mostly news article sized, though with lots of dynamic fields.Sematext SPM currently shows “Added Docs Rate” as ~1.70k/sec for the server that is receiving updates.Once problem starts happening multiple replicas go down (not necessarily the one receiving the update request from client) and cluster starts returning errors to update requests.We saw entries like following in Zookeeper logs that’s why we thought it might be related to zkClientTimeout and value.2017-04-03 09:13:03,040 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007>] - Closed socket connection for client /192.168.30.32:36420 <http://192.168.30.32:36420> which had sessionid 0x25ad61c4507008c 2017-04-03 09:27:02,078 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357>] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x15b14ba8a8e0026, likely client has closed socketat org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)at java.lang.Thread.run(Thread.java:745)2017-04-03 09:27:02,079 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007>] - Closed socket connection for client /192.168.30.32:35636 <http://192.168.30.32:35636> which had sessionid 0x15b14ba8a8e0026 2017-04-03 09:35:19,362 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197 <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197>] - Accepted socket connection from /192.168.30.32:37970 <http://192.168.30.32:37970>*Salih Şen* M: +90 533 131 17 07 E: sa...@dilisim.com <mailto:sa...@dilisim.com> W: www.dilisim.com <http://www.dilisim.com> Skype: slhsenOn 3 April 2017 at 18:01:15, Erick Erickson (erickerick...@gmail.com <mailto:erickerick...@gmail.com>) wrote:bq: We set Auto hardcommit time to 15sec and 10000 docs, and soft commit to 60000 sec and 5000 seconds Just a sanity check, the commit intervals are in milliseconds, your units look mixed up above, I'm guessing it's just a typo though. I usually don't use maxDocs because it's unpredictable. Say you're indexing at a furious rate. If you are indexing at 5,000 docs a second (and assuming the above was supposed to be soft committing every 60 seconds or 5,000 docs) you'll still be autocommitting every second. While that could be related, it's not particularly germane to your timeout. My guess is that you're getting these errors on the leader? what do you have in solr.xml for: distribUpdateConnTimeout and distribUpdateSoTimeout Those are likely the timeouts that matter. And how big are your documents? The scenario I'm thinking of is that the leader sends the update to the replica and the timeout for the replica's response exceeds the ones above. BTW, it can be useful on startup to look at your solr.log. The _actual_ values for all the timeouts are printed out, including any sysvars you've used. And how are you indexing? Mostly I'm wondering how fast you're sending docs to each leader and how. Best, ErickOn Mon, Apr 3, 2017 at 6:52 AM, Salih Sen <sa...@dilisim.com <mailto:sa...@dilisim.com>> wrote:> Hi, >> We have a three server set up with each server having 756G ram, 48 cores, > 4SSDs (each having tree solr instances on them) and a dedicated mechanical > disk for zookeeper (3 zk instances total). Each Solr instances have 31G of> heap space allocated to them. In total we have 36 Solr Instances and 3> Zookeeper instances (with 1G heapspace). Also servers 10Gig network between> them. >> We set Auto hardcommit time to 15sec and 10000 docs, and soft commit to > 60000 sec and 5000 seconds in order to avoid soft committing too much and> avoiding indexing bottlenecks. We also set DzkClientTimeout=90000. >> But it seems replicas still randomly go down while indexing. Do you have any> suggestions to prevent this situation? > > ERROR - 2017-04-03 12:24:02.503; [ ]> org.apache.solr.cloud.OverseerCollectionMessageHandler; Error from shard:> http://192.168.30.33:9132/solr> org.apache.solr.client.solrj.SolrServerException: Timeout occured while> waiting response from server at: http://192.168.30.33:9132/solr > at> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:621)> at> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)> at> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268) > at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)> at> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)> at java.util.concurrent.FutureTask.run(FutureTask.java:266)> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)> at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)> at> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)> at> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)> at> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)> at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)> at> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)> at> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)> at> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)> at> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)> at> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)> at> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)> at> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)> at> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)> at> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)> at> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)> at> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)> at> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)> at> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)> at> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)> at> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)> at> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)> at> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:515)> ... 12 more > ERROR - 2017-04-03 12:27:11.631; [c:doc s:shard3 r:core_node22 > x:doc_shard3_replica3]> org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;> error > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)> at> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)> at> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)> at> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)> at> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)> at> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)> at> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)> at> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)> at> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)> at> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)> at> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)> at> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)> at> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)> at> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)> at> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)> at> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)> at> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)> at> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)> at> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)> at> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)> at> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)> at> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)> at> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)> at java.lang.Thread.run(Thread.java:745) > ERROR - 2017-04-03 12:27:11.633; [c:doc s:shard3 r:core_node22 > x:doc_shard3_replica3]> org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;> error > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)> at> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)> at> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)> at> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)> at> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)> at> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)> at> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)> at> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)> at> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)> at> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)> at> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)> at> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)> at> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)> at> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)> at> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)> at> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)> at> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)> at> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)> at> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)> at> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)> at> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)> at> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)> at> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)> at java.lang.Thread.run(Thread.java:745) > ERROR - 2017-04-03 12:27:11.645; [c:doc s:shard3 r:core_node22 > x:doc_shard3_replica3]> org.apache.solr.update.StreamingSolrClients$ErrorReportingConcurrentUpdateSolrClient;> error > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)> at> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)> at> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)> at> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)> at> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)> at> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)> at> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)> at> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)> at> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)> at> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)> at> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)> at> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:114)> at> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)> at> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)> at> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)> at> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)> at> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)> at> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)> at> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)> at> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)> at> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)> at> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)> at> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)> at java.lang.Thread.run(Thread.java:745) > W > > > > > Salih Şen > M: +90 533 131 17 07 > E: sa...@dilisim.com <mailto:sa...@dilisim.com> > W: www.dilisim.com <http://www.dilisim.com> > Skype: slhsen