Hi Houston,

Thanks a lot for putting this together! I'd like to help with Solr
Operator. Though I have limited availability in the following two months,
maybe I can still be useful with a few things.

Some comments regarding the SIP:
- I think that in general it sounds like a good plan. I don't want to get
in the way instead of helping :)
- I think it tackles one of the three common use-cases that I've seen for
autoscaling:
1) *AutoAddReplicas*: mostly for enterprise search, some people want to
expand on query throughput. Combining that with autoscaling sounds very
appealing.
2) *Rotate indices on autoscaling events*, which should work well for
time-series data. This is what we presented last year at BBuzz and KubeCon
for Elasticsearch/OpenSearch
<https://sematext.com/blog/kubernetes-elasticsearch-autoscaling/>. The gzip
-9 version of it is that you'll probably want to create a new index with
the right number of shards after scaling out (or back in) to ensure that
the write workload (which tends to be dominant) is evenly balanced. You may
or may not want to rebalance previous shards, based on how often you go
back and forth.
3) *Rebalance existing shards as you add/remove nodes*. Which is what this
SIP tackles, if I'm getting it right.

If I understand correctly, these three don't exclude each other, so I
wouldn't bother changing this SIP to account for the other use-cases, but I
think it's nice to have them in mind or discuss them in case anyone has any
ideas.

With regards to UTILIZENODE&REPLACENODE, I think they will work OK but I
wonder if a general REBALANCESHARDS command will work better? Or maybe it's
just because I'm thinking of Elasticsearch/OpenSearch. But it seems like a
more "general" approach.

If REBALANCESHARDS sounds like a good idea, I'm thinking it could be per
collection or for the whole cluster, I'm not sure what's best. My initial
thought is that per cluster is what we need, but on the other hand per
collection is easier to implement (just assign shards of that collection,
and if the number of shards doesn't divide by the number of nodes, just
assign to the node with fewer replicas or maybe piggyback replica placement
plugins?) and it's easier to stop/recover when something goes wrong. Plus,
it's more opinionated in the sense that you'll want to have the current
(and future) number of nodes be a divisor of your number of shards. And
then maybe the Operator could have some config options on the steps that
you'll want to take. For example, I know I have 12 shards in total per
collection, I want 2,3,4,6 and 12-node configurations.

Please let me know if you have any thoughts/questions/reactions :)

Best regards,
Radu
--
Elasticsearch/OpenSearch & Solr Consulting, Production Support & Training
Sematext Cloud - Full Stack Observability
https://sematext.com/ <http://sematext.com/>


On Thu, Mar 30, 2023 at 10:54 PM Houston Putman <hous...@apache.org> wrote:

> Hello everyone,
>
> This is kind of a long-time coming, but I've finally created a SIP for
> autoscaling Solr Nodes on Kubernetes using the Solr Operator.
>
>
> https://cwiki.apache.org/confluence/display/SOLR/SIP-17%3A+Node+Autoscaling+via+Kubernetes
>
> There are still some details that need to be ironed out, but hopefully we
> can finalize everything relatively soon and try to get this out in Solr
> 9.3/9.4 and the Solr Operator v0.8.0.
>
> I've talked with quite a few people about this, so hopefully we can get a
> good amount of turn-out to get this implemented! And if anyone is
> interested in helping with the Solr Operator parts, I'd be very happy to
> mentor. It's not going to be the most straightforward code, but you will
> definitely be ramped up on contributing to the operator by the end!
>
> Please let me know if I can answer any questions!
>
> - Houston
>

Reply via email to