Hi all, I have exactly the same problem as mentioned in this thread. I would assume that handling the stale write lock should be automatically handled by this feature of add replica automatically. Can anyone provide inputs on what is missing (in configuration or otherwise) for auto add replicas to work?
Thanks! On Fri, Mar 31, 2017 at 11:49 AM, Tseng, Danny <dts...@informatica.com> wrote: > More details about the error... > > State.json: > > {"collection1":{ > "replicationFactor":"1", > "shards":{ > "shard1":{ > "range":"80000000-ffffffff", > "state":"active", > "replicas":{"core_node1":{ > "core":"collection1_shard1_replica1", > "dataDir":"hdfs://psvrlxcdh5mmdev1.somewhere.com:8020/Test/ > LDM/psvrlxbdecdh1Cluster/solr/collection1/core_node1/data/", > "base_url":"http://psvrlxcdh5mmdev3.somewhere.com:48193/solr", > "node_name":"psvrlxcdh5mmdev3.somewhere.com:48193_solr", > "state":"active", > "ulogDir":"hdfs://psvrlxcdh5mmdev1.somewhere.com:8020/Test/ > LDM/psvrlxbdecdh1Cluster/solr/collection1/core_node1/data/tlog", > "leader":"true"}}}, > "shard2":{ > "range":"0-7fffffff", > "state":"active", > "replicas":{"core_node2":{ > "core":"collection1_shard2_replica1", > "base_url":"http://psvrlxcdh5mmdev3.somewhere.com:48193/solr", > "node_name":"psvrlxcdh5mmdev3.somewhere.com:48193_solr", > "state":"down", > "leader":"true"}}}}, > "router":{ > "field":"_root_uid_", > "name":"compositeId"}, > "maxShardsPerNode":"2", > "autoAddReplicas":"true"}} > > > Solr.log > ERROR - 2017-03-31 06:00:54.382; [c:collection1 s:shard2 r:core_node2 > x:collection1_shard2_replica1] org.apache.solr.core.CoreContainer; Error > creating core [collection1_shard2_replica1]: Index dir 'hdfs:// > psvrlxcdh5mmdev1.somewhere.com:8020/Test/LDM/psvrlxbdecdh1Cluster/solr/ > collection1/core_node2/data/index/' of core 'collection1_shard2_replica1' > is already locked. The most likely cause is another Solr server (or another > solr core in this server) also configured to use this directory; other > possible causes may be specific to lockType: hdfs > org.apache.solr.common.SolrException: Index dir 'hdfs://psvrlxcdh5mmdev1. > somewhere.com:8020/Test/LDM/psvrlxbdecdh1Cluster/solr/ > collection1/core_node2/data/index/' of core 'collection1_shard2_replica1' > is already locked. The most likely cause is another Solr server (or another > solr core in this server) also configured to use this directory; other > possible causes may be specific to lockType: hdfs > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:903) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:776) > at org.apache.solr.core.CoreContainer.create( > CoreContainer.java:842) > at org.apache.solr.core.CoreContainer.create( > CoreContainer.java:779) > at org.apache.solr.handler.admin.CoreAdminOperation.lambda$ > static$0(CoreAdminOperation.java:88) > at org.apache.solr.handler.admin. > CoreAdminOperation.execute(CoreAdminOperation.java:377) > at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo. > call(CoreAdminHandler.java:365) > at org.apache.solr.handler.admin.CoreAdminHandler. > handleRequestBody(CoreAdminHandler.java:156) > at org.apache.solr.handler.RequestHandlerBase. > handleRequest(RequestHandlerBase.java:153) > at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest( > HttpSolrCall.java:660) > at org.apache.solr.servlet.HttpSolrCall.call( > HttpSolrCall.java:441) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java:303) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java:254) > at org.eclipse.jetty.servlet.ServletHandler$CachedChain. > doFilter(ServletHandler.java:1668) > at org.eclipse.jetty.servlet.ServletHandler.doHandle( > ServletHandler.java:581) > at org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:143) > at org.eclipse.jetty.security.SecurityHandler.handle( > SecurityHandler.java:548) > at org.eclipse.jetty.server.session.SessionHandler. > doHandle(SessionHandler.java:226) > at org.eclipse.jetty.server.handler.ContextHandler. > doHandle(ContextHandler.java:1160) > at org.eclipse.jetty.servlet.ServletHandler.doScope( > ServletHandler.java:511) > at org.eclipse.jetty.server.session.SessionHandler. > doScope(SessionHandler.java:185) > at org.eclipse.jetty.server.handler.ContextHandler. > doScope(ContextHandler.java:1092) > at org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:141) > at org.eclipse.jetty.server.handler. > ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at org.eclipse.jetty.server.handler.HandlerCollection. > handle(HandlerCollection.java:119) > at org.eclipse.jetty.server.handler.HandlerWrapper.handle( > HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:518) > at org.eclipse.jetty.server.HttpChannel.handle( > HttpChannel.java:308) > at org.eclipse.jetty.server.HttpConnection.onFillable( > HttpConnection.java:244) > at org.eclipse.jetty.io.AbstractConnection$ > ReadCallback.succeeded(AbstractConnection.java:273) > at org.eclipse.jetty.io.FillInterest.fillable( > FillInterest.java:95) > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run( > SelectChannelEndPoint.java:93) > at org.eclipse.jetty.util.thread.strategy. > ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246) > at org.eclipse.jetty.util.thread.strategy. > ExecuteProduceConsume.run(ExecuteProduceConsume.java:156) > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( > QueuedThreadPool.java:654) > at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run( > QueuedThreadPool.java:572) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.lucene.store.LockObtainFailedException: Index dir > 'hdfs://psvrlxcdh5mmdev1.somewhere.com:8020/Test/LDM/ > psvrlxbdecdh1Cluster/solr/collection1/core_node2/data/index/' of core > 'collection1_shard2_replica1' is already locked. The most likely cause is > another Solr server (or another solr core in this server) also configured > to use this directory; other possible causes may be specific to lockType: > hdfs > at org.apache.solr.core.SolrCore. > initIndex(SolrCore.java:658) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:850) > ... 36 more > > > From: Tseng, Danny [mailto:dts...@informatica.com] > Sent: Thursday, March 30, 2017 9:35 PM > To: solr-user@lucene.apache.org > Subject: Question about autoAddReplicas > > Hi, > > I create a collection of 2 shards with 1 replication factor and enable > autoAddReplicas. Then I kill shard2 with 'kill -9' . The overseer asked the > other solr node to create a new solr core and point to the dataDir of > shard2. Unfortunately, the new core failed to come up because of > pre-existing write lock. This is the new solr cluster state after fail > over. Notice that shard2 doesn't have dataDir assigned. Am I missing > something? > > [cid:image001.png@01D2A99B.712BB300] > > >