Re: SolrCloud separating compute from storage

2023-07-12 Thread Mikhail Khludnev
keeping from scratch speculation mode on.. On Wed, Jul 12, 2023 at 3:45 PM Ilan Ginzburg wrote: > I think mandating the use of Kafka to switch to this mode of separating > compute and storage would make adoption harder. One would also need a > deployment of Kafka that is resilient to an AZ going

Re: SolrCloud separating compute from storage

2023-07-12 Thread Ilan Ginzburg
Thanks for your feedback Mikhail. Your comment makes a lot of sense. When discussing these (pretty cool) architectures I'm missing the point of > implementing transaction log in Solr codebase. > I think Kafka is the best fit for such a pre-indexer buffer. WYDT? Starting from scratch with a desig

Re: SolrCloud separating compute from storage

2023-07-12 Thread Mikhail Khludnev
Hello Ilan, Late comment, though. On Fri, Apr 28, 2023 at 8:33 PM Ilan Ginzburg wrote: > ... > We're considering improving this approach by making the transaction log a > shard level abstraction (rather than a replica/node abstraction), and store > it in S3 as well with a transaction log per sha

Re: SolrCloud separating compute from storage

2023-05-01 Thread Jason Gerlowski
+1 - sounds very promising! On Sat, Apr 29, 2023 at 1:06 PM Wei wrote: > > This is an awesome feature for solr cloud! Currently for our read > heavy/write heavy use case, we exclude all query requests from the leader > in each shard to avoid becoming the load bottleneck. Also each solr cloud > ha

Re: SolrCloud separating compute from storage

2023-04-29 Thread Wei
This is an awesome feature for solr cloud! Currently for our read heavy/write heavy use case, we exclude all query requests from the leader in each shard to avoid becoming the load bottleneck. Also each solr cloud has its own pipeline for NRT updates. With stateless replica and persistent storage

Re: SolrCloud separating compute from storage

2023-04-29 Thread Ilan Ginzburg
The changing/overlapping leaders was the main challenge in the implementation. Logic such as: If (iAmLeader()) { doThings(); } Can have multiple participants doThings() at the same time as iAmLeader() could change just after it was checked. The only way out in such an approach is to do barriers

Re: SolrCloud separating compute from storage

2023-04-28 Thread Shawn Heisey
On 4/28/23 11:33, Ilan Ginzburg wrote: Salesforce has been working for a while on separating compute from storage in SolrCloud, see presentation at Activate 2019 SolrCloud in Public Cloud: Scaling Compute Independently from Storage . In a nutshell, the idea is that

Re: SolrCloud separating compute from storage

2023-04-28 Thread Justin Sweeney
This definitely sounds very interesting and if we could abstract it away from AWS specifically then even better. I think there are a lot of advantages with an approach like this as you've mentioned. At FullStory we are planning to get into some experiments using GCP Local SSDs and Google Cloud Stor

Re: SolrCloud separating compute from storage

2023-04-28 Thread Joel Bernstein
I mentioned NIO providers for S3, GCS and Azure in a different email thread. This could be used to abstract away the S3 specific code and provide support for GCS and Azure without much more effort. This would make unit tests much easier to write because you can simply unit test to local disk by cha

Re: SolrCloud separating compute from storage

2023-04-28 Thread David Smiley
To clarify the point to everyone: "separation of compute from storage" allows infrastructure cost savings for when you have both large scale (many shards in the cluster) and highly diverse collection/index utilization. The vision of our contribution is that an unused shard can scale down to as litt