We create “search feeds”, which are S3 files with one JSON object per line. Documents going to Solr go into a feed file first. Periodically, the files are fetched and loaded into Solr.
S3 is cross-region, so we could easily use this for multiple hot search clusters. More often, we’ve used it for major version upgrades. Make a new cluster with version 8, feed both the Solr 6 and Solr 8 clusters independently from the feed files. After traffic is moved over, stop feeding the Solr 6 cluster and recycle the machines. For disaster recovery, we’d rebuild the cluster (Terraform), then run the loader. wunder Walter Underwood [email protected] http://observer.wunderwood.org/ (my blog) > On Mar 1, 2022, at 3:54 PM, Matt Kuiper <[email protected]> wrote: > > Thanks Anshum, Dima! Yes, I figure this approach will be quite challenging > to implement, and may not be worth the cost. > > Anshum, > > I had not thought of versioning ( > https://solr.apache.org/guide/8_2/updating-parts-of-documents.html#document-centric-versioning-constraints), > but will consider it. Yes, some of our updates are Atomic updates. > > Yes, initial thinking is using a single "queue" of updates where multiple > instances (associated to a particular SorlCloud instance) of the same > indexing service will consume from the queue and index to their associated > SolrCloud instance. > > I will take a look at your proposal! > > Thanks again, > > Matt > > On Tue, Mar 1, 2022 at 4:12 PM Anshum Gupta <[email protected]> wrote: > >> Hi Matt, >> >> I'll start by saying that this has been long due at my end. >> >> There are a multitude of challenges with a hot-hot architecture involving >> multiple SolrCloud clusters. An important question here is if you are going >> to manage the versioning yourself. Also, if your updates would ever >> overwrite data. Here's an initial proposal for something along those lines >> (but doesn't support an unversioned hot-hot setup w/ document edits) - >> >> https://cwiki.apache.org/confluence/display/SOLR/SIP-13%3A+Cross+Data+Center+Replication >> >> Hot-Hot setups are really complex and there are a few ways I've handled (or >> seen them being handled. >> 1. The best way here is to either have externally versioned documents sent >> to Solr clusters or >> 2. rely on a single point of entry i.e. updates always go to a queuing >> service for instance and then have an application that's responsible for >> consuming from this (queue?). >> >> -Anshum >> >> On Tue, Mar 1, 2022 at 3:02 PM mtn search <[email protected]> wrote: >> >>> Hello, >>> >>> My team is looking to deploy Solr 8 SolrCloud on two on-prem datacenters >>> via EKS. We are considering a HOT | HOT HA architecture between the data >>> centers where data would be indexed (duplicated) to SolrCloud instances >> in >>> both datacenters. Then via service (to be worked out) queries could go to >>> either datacenter. >>> >>> I believe one of the challenges will be keeping the SolrCloud instances >>> (holding the same data) in sync. >>> >>> I am curious if others have tried this and are willing to share any tips, >>> lessons learned, or things we should consider. >>> >>> Thanks, >>> Matt >>> >> >> >> -- >> Anshum Gupta >>
