Hi,
I'm playing around with the autoscaling feature of Solr 7.7.1 and have the 
following scenario to solve:

- One collection with two shards
- I want to activate autoscaling to achieve the following behavior:
  * Every time a new node comes up, it should get a new replica automatically 
through the autoscaling feature of Solr
  * Each node should contain only 1 replica (either from shard1 or shard2)
  * Solr should try to balance the number of replica in the cluster if 
possible, meaning:
     * if I have an even number of nodes in the cluster half of the nodes 
should have a replica for shard1 and the other half should have a replica for 
shard2
     * if I have an odd number of nodes one of the two shards has one replica 
more than the other shard

The scaling should look like this:
1) node a comes up and gets a shard1 replica
2) node b comes up and gets a shard2 replica
3) node c comes up and gets a shard1 replica
4) node d comes up and gets a shard2 replica
5) etc.

The problem I have is that if a new node comes up it always gets a replica from 
the second shard.
There is no balancing between the number of replicas of the two shards.

I have created the following autoscaling policies:

curl -X POST -H 'Content-Type: application/json' \
"${SOLR_LOCAL_URL}/api/cluster/autoscaling" --data-binary '{
"set-cluster-policy": [
{"replica": "<2", "shard": "#ANY", "node": "#ANY"},
{"replica": "#EQUAL", "shard": "#ANY", "node": "#ANY"}
]
}'

It seems that all policies are always evaluated from the point of view of a 
single node and not from the point of view of the whole cluster (it seems as if 
node:#ANY means every single node on its own and not the whole cluster).

Can my desired scaling behavior be achieved with the current autoscaling 
implementation in Solr?

My cluster setup looks like this:

#!/bin/bash
SOLR_LOCAL_URL=http://localhost:9000

# node added trigger
curl --header "Content-Type: application/json" \
   --request POST \
   --data '{ "set-trigger": { "name": "node_added_trigger", "event": 
"nodeAdded", "waitFor": "10s", "enabled": "true", "preferredOperation": 
"ADDREPLICA" } }' \
   "$SOLR_LOCAL_URL/solr/admin/autoscaling"

# node lost trigger
curl --header "Content-Type: application/json" \
   --request POST \
   --data '{ "set-trigger": { "name": "node_lost_trigger", "event": "nodeLost", 
"waitFor": "10s", "enabled": "true", "preferredOperation": "DELETENODE" } }' \
   "$SOLR_LOCAL_URL/solr/admin/autoscaling"

activeCollection=products
shards=2
replicationFactor=2
maxShardsPerNode=1

curl 
"$SOLR_LOCAL_URL/solr/admin/collections?action=CREATE&name=${activeCollection}&numShards=${shards}&replicationFactor=${replicationFactor}&collection.configName=products&wt=json&maxShardsPerNode=${maxShardsPerNode}"
 # &autoAddReplicas=true

# needed to not get into add-replica loop: 
https://lucene.472066.n3.nabble.com/Autoscaling-using-triggers-to-create-new-replicas-td4415260.html
curl -X POST -H 'Content-Type: application/json' \
"${SOLR_LOCAL_URL}/api/cluster/autoscaling" --data-binary '{
"set-cluster-policy": [
{"replica": "<2", "shard": "#ANY", "node": "#ANY"},
{"replica": "#EQUAL", "shard": "#ANY", "node": "#ANY"}
]
}'

Best regards,
Christian

Reply via email to