[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jan Høydahl resolved SOLR-3274. ------------------------------- Resolution: Abandoned I'm resolving this mainly due to age, due to its vaugeness and lack of recent attention. If this was a major current issue with 8.x we'd see much more activity on this JIRA. I encourage folks who believe they found a new cloud/zk bug in 8.x to open a new Jira instead of re-opening this one. > ZooKeeper related SolrCloud problems > ------------------------------------ > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 4.0-ALPHA > Environment: Any > Reporter: Per Steffensen > Priority: Major > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Exception 2) We also see errors like the following, and if Im not mistaken, > they start occuring shortly after "Exception 1)" (above) shows for the fist > time > {code} > Mar 22, 2012 5:07:26 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: > at > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149) > at > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} > Please note that the exception says "no servers hosting shard: <blank>". > Looking at the code a "shard"-string was actually supposed to be written at > <blank>. Basically this means that HttpShardHandler.submit was called with > an empty "shard"-string parameter. But who does this? > CoreAdminHandler.handleDistribUrlAction or SearchHandler.handleRequestBody or > SyncStrategy or PeerSync or... I dont know, and maybe it is not that > relevant, because I guess they all get the "shard"-string from ZooKeeper. > Again something pointing in the direction of unstable collaboration between > Solr and ZooKeeper. > Exception 3) We also see exceptions like this > {code} > Mar 25, 2012 3:05:38 PM org.apache.solr.common.cloud.ZkStateReader$3 process > WARNING: ZooKeeper watch triggered, but Solr cannot talk to ZK > Mar 25, 2012 3:05:38 PM org.apache.solr.cloud.LeaderElector$1 process > WARNING: > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode > = Session expired for /collections/collA/leader_elect/slice26/election > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:118) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249) > at > org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:266) > at > org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:263) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65) > at > org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:263) > at > org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:92) > at > org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57) > at > org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507) > {code} > Maybe this will we usable for some bug-fixing or for making the code more > stable. I know 4.0 is not stable/released yet, and that we therefore should > expect this kind of errors at the moment. So this is not negative criticism - > just reporting of issues observed when using SolrCloud features under high > load for several days. Any feedback is more than welcome. > Regards, Per Steffensen -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org