Re: Data Centre recovery/replication, does this seem plausible?

Erick Erickson Wed, 28 Aug 2013 06:48:02 -0700

If you can satisfy this statement then it seems possible. This is the same
restirction
as "atomic updates".:
The SolrEntityProcessor can only copy fields that are stored in the source
index.



On Wed, Aug 28, 2013 at 9:41 AM, Timothy Potter <thelabd...@gmail.com>wrote:

> I've been thinking about this one too and was curious about using the Solr
> Entity support in the DIH to do the import from one DC to another (for the
> lost docs). In my mind, one configures the DIH to use the
> SolrEntityProcessor with a query to capture the docs in the DC that stayed
> online, most likely using a timestamp in the query (see:
> http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor).
>
> Would that work? If so, any downsides? I've only used DIH /
> SolrEntityProcessor to populate a staging / dev environment from prod but
> have had good success with it.
>
> Thanks.
> Tim
>
>
> On Wed, Aug 28, 2013 at 6:59 AM, Erick Erickson <erickerick...@gmail.com
> >wrote:
>
> > The separate DC problem has been lurking for a while. But your
> > understanding it a little off. When a replica discovers that
> > it's "too far" out of date, it does an old-style replication. IOW, the
> > tlog doesn't contain the entire delta. Eventually, the old-style
> > replications catch up to "close enough" and _then_ the remaining
> > docs in the tlog are replayed. The target number of updates in the
> > tlog is 100 so it's a pretty small window that's actually replayed in
> > the normal case.
> >
> > None of which helps your problem. The simplest way (and on the
> > expectation that DC outages were pretty rare!) would be to have your
> > indexing process fire the missed updates at the DC after it came
> > back up.
> >
> > Copying from one DC to another is tricky. You'd have to be very,
> > very sure that you copied indexes to the right shard. Ditto for any
> > process that tried to have, say, a single node from the recovering
> > DC temporarily join the good DC, at least long enough to synch.
> >
> > Not a pretty problem, we don't really have any best practices yet
> > that I know of.
> >
> > FWIW,
> > Erick
> >
> >
> > On Wed, Aug 28, 2013 at 8:13 AM, Daniel Collins <danwcoll...@gmail.com
> > >wrote:
> >
> > > We have 2 separate data centers in our organisation, and in order to
> > > maintain the ZK quorum during any DC outage, we have 2 separate Solr
> > > clouds, one in each DC with separate ZK ensembles but both are fed with
> > the
> > > same indexing data.
> > >
> > > Now in the event of a DC outage, all our Solr instances go down, and
> when
> > > they come back up, we need some way to recover the "lost" data.
> > >
> > > Our thought was to replicate from the working DC, but is there a way to
> > do
> > > that whilst still maintaining an "online" presence for indexing
> purposes?
> > >
> > > In essence, we want to do what happens within Solr cloud's recovery, so
> > (as
> > > I understand cloud recovery) a node starts up, (I'm assuming worst case
> > and
> > > peer sync has failed) then buffers all updates into the transaction
> log,
> > > replicates from the leader, and replays the transaction log to get
> > > everything in sync.
> > >
> > > Is it conceivable to do the same by extending Solr, so on the
> activation
> > of
> > > some handler (user triggered), we initiated a "replicate from other
> DC",
> > > which puts all the leaders into buffering updates, replicate from some
> > > other set of servers and then replay?
> > >
> > > Our goal is to try to minimize the downtime (beyond the initial
> outage),
> > so
> > > we would ideally like to be able to start up indexing before this
> > > "replicate/clone" has finished, that's why I thought to enable
> buffering
> > on
> > > the transaction log.  Searches shouldn't be sent here, but if they do
> we
> > > have a valid (albeit old) index to serve those until the new one swaps
> > in.
> > >
> > > Just curious how any other DC-aware setups handle this kind of
> scenario?
> > >  Or other concerns, issues with this type of approach.
> > >
> >
>

Re: Data Centre recovery/replication, does this seem plausible?

Reply via email to