I've simplified things from my previous email, and I'm still seeing errors.

Using solr 4.4.0 with two nodes, starting with a single shard.  Collection
is named "marin", host names are dumbo and solrcloud1.  I bring up an empty
cloud and index 50 documents.  I can query them and everything looks fine.
 This is clusterstate.json at that point:

{"marin":{
    "shards":{"shard1":{
        "range":"80000000-7fffffff",
        "state":"active",
        "replicas":{
          "dumbo:8983_solr_marin":{
            "state":"active",
            "core":"marin",
            "node_name":"dumbo:8983_solr",
            "base_url":"http://dumbo:8983/solr";,
            "leader":"true"},
          "solrcloud1:8983_solr_marin":{
            "state":"active",
            "core":"marin",
            "node_name":"solrcloud1:8983_solr",
            "base_url":"http://solrcloud1:8983/solr"}}}},
    "router":"compositeId"}}

I attempt to split with
http://dumbo:8983/solr/admin/collections?action=SPLITSHARD&collection=marin&shard=shard1

After 127559ms, that call returns with
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:I was
asked to wait on state active for solrcloud1:8983_solr but I still do not
see the requested state. I see state: recovering live:true

clusterstate.json at this point:

{"marin":{
    "shards":{
      "shard1":{
        "range":"80000000-7fffffff",
        "state":"active",
        "replicas":{
          "dumbo:8983_solr_marin":{
            "state":"active",
            "core":"marin",
            "node_name":"dumbo:8983_solr",
            "base_url":"http://dumbo:8983/solr";,
            "leader":"true"},
          "solrcloud1:8983_solr_marin":{
            "state":"active",
            "core":"marin",
            "node_name":"solrcloud1:8983_solr",
            "base_url":"http://solrcloud1:8983/solr"}}},
      "shard1_0":{
        "range":"80000000-ffffffff",
        "state":"construction",
        "replicas":{
          "dumbo:8983_solr_marin_shard1_0_replica1":{
            "state":"active",
            "core":"marin_shard1_0_replica1",
            "node_name":"dumbo:8983_solr",
            "base_url":"http://dumbo:8983/solr";,
            "leader":"true"},
          "solrcloud1:8983_solr_marin_shard1_0_replica2":{
            "state":"active",
            "core":"marin_shard1_0_replica2",
            "node_name":"solrcloud1:8983_solr",
            "base_url":"http://solrcloud1:8983/solr"}}},
      "shard1_1":{
        "range":"0-7fffffff",
        "state":"construction",
        "replicas":{
          "dumbo:8983_solr_marin_shard1_1_replica1":{
            "state":"active",
            "core":"marin_shard1_1_replica1",
            "node_name":"dumbo:8983_solr",
            "base_url":"http://dumbo:8983/solr";,
            "leader":"true"},
          "solrcloud1:8983_solr_marin_shard1_1_replica2":{
            "state":"recovering",
            "core":"marin_shard1_1_replica2",
            "node_name":"solrcloud1:8983_solr",
            "base_url":"http://solrcloud1:8983/solr"}}}},
    "router":"compositeId"}}


In the logs on dumbo, I see several of these:

290391 [qtp243983770-60] INFO
 org.apache.solr.update.processor.LogUpdateProcessor  –
[marin_shard1_1_replica1] webapp=/solr path=/update
params={waitSearcher=true&openSearcher=false&commit=true&wt=javabin&commit_end_point=true&version=2&softCommit=false}
{} 0 2
290392 [qtp243983770-60] ERROR org.apache.solr.core.SolrCore  –
java.io.IOException: cannot uncache file="_1.nvm": it was separately also
created in the delegate directory
        at
org.apache.lucene.store.NRTCachingDirectory.unCache(NRTCachingDirectory.java:297)
        at
org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:216)
        at
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4109)
        at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2809)
        at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
        at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549)
        at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
        at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)

and then finally this:

406671 [qtp243983770-22] ERROR org.apache.solr.core.SolrCore  –
org.apache.solr.common.SolrException: I was asked to wait on state active
for solrcloud1:8983_solr but I still do not see the requested state. I see
state: recovering live:true
        at
org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:966)
        at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:191)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
        at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)

On solrcloud1 I see several of these:

259170 [RecoveryThread] INFO  org.apache.solr.update.PeerSync  – PeerSync:
core=marin_shard1_0_replica2 url=http://solrcloud1:8983/solr START
replicas=[http://dumbo:8983/solr/marin_shard1_0_replica1/] nUpdates=100
259192 [RecoveryThread] WARN  org.apache.solr.update.PeerSync  – PeerSync:
core=marin_shard1_0_replica2 url=http://solrcloud1:8983/solr  exception
talking to http://dumbo:8983/solr/marin_shard1_0_replica1/, failed
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Server at http://dumbo:8983/solr/marin_shard1_0_replica1 returned non ok
status:404, message:Not Found
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
        at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156)
        at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

and the same messages for marin_shard1_1_replica1.

I can post my solrconfig.xml or schema.xml if that would be useful.  Maybe
I'll try switching to the example configs and see if I can reproduce the
issue.  Otherwise, I'm kinda stumped here.

Any suggestions?  I can reproduce this consistently in under 5 minutes, so
I'm happy to try ideas.

Thanks.

-Greg

Reply via email to