We create “search feeds”, which are S3 files with one JSON object per line. 
Documents going to Solr go into a feed file first. Periodically, the files are 
fetched and loaded into Solr.

S3 is cross-region, so we could easily use this for multiple hot search 
clusters. More often, we’ve used it for major version upgrades. Make a new 
cluster with version 8, feed both the Solr 6 and Solr 8 clusters independently 
from the feed files. After traffic is moved over, stop feeding the Solr 6 
cluster and recycle the machines.

For disaster recovery, we’d rebuild the cluster (Terraform), then run the 
loader.

wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/  (my blog)

> On Mar 1, 2022, at 3:54 PM, Matt Kuiper <[email protected]> wrote:
> 
> Thanks Anshum, Dima!  Yes, I figure this approach will be quite challenging
> to implement, and may not be worth the cost.
> 
> Anshum,
> 
> I had not thought of versioning (
> https://solr.apache.org/guide/8_2/updating-parts-of-documents.html#document-centric-versioning-constraints),
> but will consider it.  Yes, some of our updates are Atomic updates.
> 
> Yes, initial thinking is using a single "queue" of updates where multiple
> instances (associated to a particular SorlCloud instance) of the same
> indexing service will consume from the queue and index to their associated
> SolrCloud instance.
> 
> I will take a look at your proposal!
> 
> Thanks again,
> 
> Matt
> 
> On Tue, Mar 1, 2022 at 4:12 PM Anshum Gupta <[email protected]> wrote:
> 
>> Hi Matt,
>> 
>> I'll start by saying that this has been long due at my end.
>> 
>> There are a multitude of challenges with a hot-hot architecture involving
>> multiple SolrCloud clusters. An important question here is if you are going
>> to manage the versioning yourself. Also, if your updates would ever
>> overwrite data. Here's an initial proposal for something along those lines
>> (but doesn't support an unversioned hot-hot setup w/ document edits) -
>> 
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-13%3A+Cross+Data+Center+Replication
>> 
>> Hot-Hot setups are really complex and there are a few ways I've handled (or
>> seen them being handled.
>> 1. The best way here is to either have externally versioned documents sent
>> to Solr clusters or
>> 2. rely on a single point of entry i.e. updates always go to a queuing
>> service for instance and then have an application that's responsible for
>> consuming from this (queue?).
>> 
>> -Anshum
>> 
>> On Tue, Mar 1, 2022 at 3:02 PM mtn search <[email protected]> wrote:
>> 
>>> Hello,
>>> 
>>> My team is looking to deploy Solr 8 SolrCloud on two on-prem datacenters
>>> via EKS.  We are considering a HOT | HOT HA architecture between the data
>>> centers where data would be indexed (duplicated) to SolrCloud instances
>> in
>>> both datacenters. Then via service (to be worked out) queries could go to
>>> either datacenter.
>>> 
>>> I believe one of the challenges will be keeping the SolrCloud instances
>>> (holding the same data) in sync.
>>> 
>>> I am curious if others have tried this and are willing to share any tips,
>>> lessons learned, or things we should consider.
>>> 
>>> Thanks,
>>> Matt
>>> 
>> 
>> 
>> --
>> Anshum Gupta
>> 

Reply via email to