On 01/31/2014 10:35 AM, Mark Miller wrote:
On Jan 31, 2014, at 10:31 AM, Mark Miller <markrmil...@gmail.com> wrote:
Seems unlikely by the way. Sounds like what probably happened is that for some
reason it thought when you restarted the shard that you were creating it with
numShards=2 instead of 1.
No, that’s not right. Sorry.
It must have got assigned a new core node name. numShards would still have to
be seen as 1 for it to try and be a replica. Brain lapse.
Are you using a custom coreNodeName or taking the default? Can you post your
solr.xml so we can see your genericCoreNodeNames and coreNodeName settings?
One possibility is that you got assigned a coreNodeName, but for some reason it
was not persisted in solr.xml.
- Mark
http://about.me/markrmiller
There is nothing of note in the zookeeper logs. My solr.xml (sanitized
for privacy) and identical on all 4 nodes.
<solr persistent="false"
zkHost="xx.xx.xx.xx:2181,xx.xx.xx.xx:2181,xx.xx.xx.xx:2181,xx.xx.xx.xx:2181,xx.xx.xx.xx:2181">
<cores adminPath="/admin/cores"
host="${host:}"
hostPort="8080"
hostContext="${hostContext:/x}"
zkClientTimeout="${zkClientTimeout:15000}"
defaultCoreName="c1"
shareSchema="true" >
<core name="c1"
collection="col1"
instanceDir="/dir/x"
config="solrconfig.xml"
dataDir="/dir/x/data/y"
/>
</cores>
</solr>
I don't specify coreNodeName nor a genericCoreNodeNames default value
... should I?
The tomcat log is basically just a replay of what happened.
16443 [coreLoadExecutor-4-thread-2] INFO
org.apache.solr.core.CoreContainer ? registering core: ...
# this is, I think what you are talking about above with new coreNodeName
16444 [coreLoadExecutor-4-thread-2] INFO
org.apache.solr.cloud.ZkController ? Register replica - core:c1
address:http://xx.xx.xx.xx:8080/x collection: col1 shard:shard4
16453 [coreLoadExecutor-4-thread-2] INFO
org.apache.solr.client.solrj.impl.HttpClientUtil ? Creating new http
client,
config:maxConnections=10000&maxConnectionsPerHost=20&connTimeout=30000&socketTimeout=30000&retry=false
16505 [coreLoadExecutor-4-thread-2] INFO
org.apache.solr.cloud.ZkController ? We are http://node1:8080/x and
leader is http://node2:8080/x
Then it just starts replicating.
If there is anything specific I should be groking for in these logs, let
me know.
Also, given that my clusterstate.json now looks like this:
assume:
node1=xx.xx.xx.1
node2=xx.xx.xx.2
"shard4":{
"range":"20000000-3fffffff",
"state":"active",
"replicas":{
"node2:8080_x_col1":{
"state":"active",
"core":"c1",
"node_name":"node2:8080_x",
"base_url":"http://node2:8080/x",
"leader":"true"},
**** this should not be a replica of shard2 but its own shard1
"node1:8080_x_col1":{
"state":"recovering",
"core":"c1",
"node_name":"node1:8080_x",
"base_url":"http://node1:8080/x"}},
Can I just recreate shard1
"shard1":{
***** NOTE: range is assumed based on ranges of other nodes
"range":"0-1fffffff",
"state":"active",
"replicas":{
"node1:8080_x_col1":{
"state":"active",
"core":"c1",
"node_name":"node1:8080_x",
"base_url":"http://node1:8080/x",
"leader":"true"}},
... and then remove the replica ..
"shard4":{
"range":"20000000-3fffffff",
"state":"active",
"replicas":{
"node2:8080_x_col1":{
"state":"active",
"core":"c1",
"node_name":"node2:8080_x",
"base_url":"http://node2:8080/x",
"leader":"true"}},
That would be great...
thanks for your help
David