On 01/31/2014 10:35 AM, Mark Miller wrote:



On Jan 31, 2014, at 10:31 AM, Mark Miller <markrmil...@gmail.com> wrote:

Seems unlikely by the way. Sounds like what probably happened is that for some 
reason it thought when you restarted the shard that you were creating it with 
numShards=2 instead of 1.

No, that’s not right. Sorry.

It must have got assigned a new core node name. numShards would still have to 
be seen as 1 for it to try and be a replica. Brain lapse.

Are you using a custom coreNodeName or taking the default? Can you post your 
solr.xml so we can see your genericCoreNodeNames and coreNodeName settings?

One possibility is that you got assigned a coreNodeName, but for some reason it 
was not persisted in solr.xml.

- Mark

http://about.me/markrmiller


There is nothing of note in the zookeeper logs. My solr.xml (sanitized for privacy) and identical on all 4 nodes.

<solr persistent="false" zkHost="xx.xx.xx.xx:2181,xx.xx.xx.xx:2181,xx.xx.xx.xx:2181,xx.xx.xx.xx:2181,xx.xx.xx.xx:2181">
  <cores adminPath="/admin/cores"
         host="${host:}"
         hostPort="8080"
         hostContext="${hostContext:/x}"
         zkClientTimeout="${zkClientTimeout:15000}"
         defaultCoreName="c1"
         shareSchema="true" >

     <core name="c1"
           collection="col1"
           instanceDir="/dir/x"
           config="solrconfig.xml"
           dataDir="/dir/x/data/y"
     />
  </cores>
</solr>

I don't specify coreNodeName nor a genericCoreNodeNames default value ... should I?

The tomcat log is basically just a replay of what happened.

16443 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.core.CoreContainer ? registering core: ...

# this is, I think what you are talking about above with new coreNodeName
16444 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.cloud.ZkController ? Register replica - core:c1 address:http://xx.xx.xx.xx:8080/x collection: col1 shard:shard4

16453 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.client.solrj.impl.HttpClientUtil ? Creating new http client, config:maxConnections=10000&maxConnectionsPerHost=20&connTimeout=30000&socketTimeout=30000&retry=false

16505 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.cloud.ZkController ? We are http://node1:8080/x and leader is http://node2:8080/x

Then it just starts replicating.

If there is anything specific I should be groking for in these logs, let me know.

Also, given that my clusterstate.json now looks like this:

assume:
  node1=xx.xx.xx.1
  node2=xx.xx.xx.2

"shard4":{
        "range":"20000000-3fffffff",
        "state":"active",
        "replicas":{
          "node2:8080_x_col1":{
            "state":"active",
            "core":"c1",
            "node_name":"node2:8080_x",
            "base_url":"http://node2:8080/x";,
            "leader":"true"},
**** this should not be a replica of shard2 but its own shard1
          "node1:8080_x_col1":{
            "state":"recovering",
            "core":"c1",
            "node_name":"node1:8080_x",
            "base_url":"http://node1:8080/x"}},

Can I just recreate shard1

"shard1":{
***** NOTE: range is assumed based on ranges of other nodes
        "range":"0-1fffffff",
        "state":"active",
        "replicas":{
          "node1:8080_x_col1":{
            "state":"active",
            "core":"c1",
            "node_name":"node1:8080_x",
            "base_url":"http://node1:8080/x";,
            "leader":"true"}},

... and then remove the replica ..
"shard4":{
        "range":"20000000-3fffffff",
        "state":"active",
        "replicas":{
          "node2:8080_x_col1":{
            "state":"active",
            "core":"c1",
            "node_name":"node2:8080_x",
            "base_url":"http://node2:8080/x";,
            "leader":"true"}},

That would be great...

thanks for your help

David

Reply via email to