Hi Houston, Thanks a lot for putting this together! I'd like to help with Solr Operator. Though I have limited availability in the following two months, maybe I can still be useful with a few things.
Some comments regarding the SIP: - I think that in general it sounds like a good plan. I don't want to get in the way instead of helping :) - I think it tackles one of the three common use-cases that I've seen for autoscaling: 1) *AutoAddReplicas*: mostly for enterprise search, some people want to expand on query throughput. Combining that with autoscaling sounds very appealing. 2) *Rotate indices on autoscaling events*, which should work well for time-series data. This is what we presented last year at BBuzz and KubeCon for Elasticsearch/OpenSearch <https://sematext.com/blog/kubernetes-elasticsearch-autoscaling/>. The gzip -9 version of it is that you'll probably want to create a new index with the right number of shards after scaling out (or back in) to ensure that the write workload (which tends to be dominant) is evenly balanced. You may or may not want to rebalance previous shards, based on how often you go back and forth. 3) *Rebalance existing shards as you add/remove nodes*. Which is what this SIP tackles, if I'm getting it right. If I understand correctly, these three don't exclude each other, so I wouldn't bother changing this SIP to account for the other use-cases, but I think it's nice to have them in mind or discuss them in case anyone has any ideas. With regards to UTILIZENODE&REPLACENODE, I think they will work OK but I wonder if a general REBALANCESHARDS command will work better? Or maybe it's just because I'm thinking of Elasticsearch/OpenSearch. But it seems like a more "general" approach. If REBALANCESHARDS sounds like a good idea, I'm thinking it could be per collection or for the whole cluster, I'm not sure what's best. My initial thought is that per cluster is what we need, but on the other hand per collection is easier to implement (just assign shards of that collection, and if the number of shards doesn't divide by the number of nodes, just assign to the node with fewer replicas or maybe piggyback replica placement plugins?) and it's easier to stop/recover when something goes wrong. Plus, it's more opinionated in the sense that you'll want to have the current (and future) number of nodes be a divisor of your number of shards. And then maybe the Operator could have some config options on the steps that you'll want to take. For example, I know I have 12 shards in total per collection, I want 2,3,4,6 and 12-node configurations. Please let me know if you have any thoughts/questions/reactions :) Best regards, Radu -- Elasticsearch/OpenSearch & Solr Consulting, Production Support & Training Sematext Cloud - Full Stack Observability https://sematext.com/ <http://sematext.com/> On Thu, Mar 30, 2023 at 10:54 PM Houston Putman <hous...@apache.org> wrote: > Hello everyone, > > This is kind of a long-time coming, but I've finally created a SIP for > autoscaling Solr Nodes on Kubernetes using the Solr Operator. > > > https://cwiki.apache.org/confluence/display/SOLR/SIP-17%3A+Node+Autoscaling+via+Kubernetes > > There are still some details that need to be ironed out, but hopefully we > can finalize everything relatively soon and try to get this out in Solr > 9.3/9.4 and the Solr Operator v0.8.0. > > I've talked with quite a few people about this, so hopefully we can get a > good amount of turn-out to get this implemented! And if anyone is > interested in helping with the Solr Operator parts, I'd be very happy to > mentor. It's not going to be the most straightforward code, but you will > definitely be ramped up on contributing to the operator by the end! > > Please let me know if I can answer any questions! > > - Houston >