Re: Full index replication upon service restart

Erick Erickson Mon, 11 Feb 2019 19:28:28 -0800

bq. To answer your question about index size on
disk, it is 3 TB on every node. As mentioned it's a 32 GB machine and I
allocated 24GB to Java heap.


This is massively undersized in terms of RAM in my experience. You're
trying to cram 3TB of index into 32GB of memory. Frankly, I don't think
there's much you can do to increase stability in this situation, too many
things are going on. In particular, you're indexing during node restart.

That means that
1> you'll almost inevitably get a full sync on start given your update
     rate.
2> while you're doing the full sync, all new updates are sent to the
      recovering replica and put in the tlog.
3> When the initial replication is done, the documents sent to the
     tlog while recovering are indexed. This is 7 hours of accumulated
     updates.
4> If much goes wrong in this situation, then you're talking another full
     sync.
5> rinse, repeat.

There are no magic tweaks here. You really have to rethink your
architecture. I'm actually surprised that your queries are performant.
I expect you're getting a _lot_ of I/O, that is the relevant parts of your
index are swapping in and out of the OS memory space. A _lot_.
Or you're only using a _very_ small bit of your index.

Sorry to be so negative, but this is not a situation that's amenable to
a quick fix.

Best,
Erick




On Mon, Feb 11, 2019 at 4:10 PM Rahul Goswami <rahul196...@gmail.com> wrote:
>
> Thanks for the response Eric. To answer your question about index size on
> disk, it is 3 TB on every node. As mentioned it's a 32 GB machine and I
> allocated 24GB to Java heap.
>
> Further monitoring the recovery, I see that when the follower node is
> recovering, the leader node (which is NOT recovering) almost freezes with
> 100% CPU usage and 80%+ memory usage. Follower node's memory usage is 80%+
> but CPU is very healthy. Also Follower node's log is filled up with updates
> forwarded from the leader ("...PRE_UPDATE FINISH
> {update.distrib=FROMLEADER&distrib.from=...") and replication starts much
> afterwards.
> There have been instances when complete recovery took 10+ hours. We have
> upgraded to a 4 Gbps NIC between the nodes to see if it helps.
>
> Also, a few followup questions:
>
> 1) Is  there a configuration which would start throttling update requests
> if the replica falls behind a certain number of updates so as to not
> trigger an index replication later?  If not, would it be a worthy
> enhancement?
> 2) What would be a recommended hard commit interval for this kind of setup
> ?
> 3) What are some of the improvements in 7.5 with respect to recovery as
> compared to 7.2.1?
> 4) What do the below peersync failure logs lines mean?  This would help me
> better understand the reasons for peersync failure and maybe devise some
> alert mechanism to start throttling update requests from application
> program if feasible.
>
> *PeerSync Failure type 1*:
> ----------------------------------
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync Fingerprint comparison: 1
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync Other fingerprint:
> {maxVersionSpecified=1624579878580912128,
> maxVersionEncountered=1624579893816721408, maxInHash=1624579878580912128,
> versionsHash=-8308981502886241345, numVersions=32966082, numDocs=32966165,
> maxDoc=1828452}, Our fingerprint: {maxVersionSpecified=1624579878580912128,
> maxVersionEncountered=1624579975760838656, maxInHash=1624579878580912128,
> versionsHash=4017509388564167234, numVersions=32966066, numDocs=32966165,
> maxDoc=1828452}
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync PeerSync:
> core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42 url=
> http://indexnode1:8983/solr DONE. sync failed
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:8983_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.cloud.RecoveryStrategy PeerSync Recovery was not successful
> - trying replication.
>
>
> *PeerSync Failure type 1*:
> ---------------------------------
> 2019-02-02 20:26:56.256 WARN
> (recoveryExecutor-4-thread-11-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
> s:shard12 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node49)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard12 r:core_node49
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46]
> org.apache.solr.update.PeerSync PeerSync:
> core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46 url=
> http://indexnode1:20000/solr too many updates received since start -
> startingUpdates no longer overlaps with our currentUpdates
>
>
> Regards,
> Rahul
>
> On Thu, Feb 7, 2019 at 12:59 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > bq. We have a heavy indexing load of about 10,000 documents every 150
> > seconds.
> > Not so heavy query load.
> >
> > It's unlikely that changing numRecordsToKeep will help all that much if
> > your
> > maintenance window is very large. Rather, that number would have to be
> > _very_
> > high.
> >
> > 7 hours is huge. How big are your indexes on disk? You're essentially
> > going to get a
> > full copy from the leader for each replica, so network bandwidth may
> > be the bottleneck.
> > Plus, every doc that gets indexed to the leader during sync will be stored
> > away in the replica's tlog (not limited by numRecordsToKeep) and replayed
> > after
> > the full index replication is accomplished.
> >
> > Much of the retry logic for replication has been improved starting
> > with Solr 7.3 and,
> > in particular, Solr 7.5. That might address your replicas that just
> > fail to replicate ever,
> > but won't help that replicas need to full sync anyway.
> >
> > That said, by far the simplest thing would be to stop indexing during
> > your maintenance
> > window if at all possible.
> >
> > Best,
> > Erick
> >
> > On Tue, Feb 5, 2019 at 9:11 PM Rahul Goswami <rahul196...@gmail.com>
> > wrote:
> > >
> > > Hello Solr gurus,
> > >
> > > So I have a scenario where on Solr cluster restart the replica node goes
> > > into full index replication for about 7 hours. Both replica nodes are
> > > restarted around the same time for maintenance. Also, during usual times,
> > > if one node goes down for whatever reason, upon restart it again does
> > index
> > > replication. In certain instances, some replicas just fail to recover.
> > >
> > > *SolrCloud 7.2.1 *cluster configuration*:*
> > > ============================
> > > 16 shards - replication factor=2
> > >
> > > Per server configuration:
> > > ======================
> > > 32GB machine - 16GB heap space for Solr
> > > Index size : 3TB per server
> > >
> > > autoCommit (openSearcher=false) of 3 minutes
> > >
> > > We have a heavy indexing load of about 10,000 documents every 150
> > seconds.
> > > Not so heavy query load.
> > >
> > > Reading through some of the threads on similar topic, I suspect it would
> > be
> > > the disparity between the number of updates(>100) between the replicas
> > that
> > > is causing this (courtesy our indexing load). One of the suggestions I
> > saw
> > > was using numRecordsToKeep.
> > > However as Erick mentioned in one of the threads, that's a bandaid
> > measure
> > > and I am trying to eliminate some of the fundamental issues that might
> > > exist.
> > >
> > > 1) Is the heap too less for that index size? If yes, what would be a
> > > recommended max heap size?
> > > 2) Is there a general guideline to estimate the required max heap based
> > on
> > > index size on disk?
> > > 3) What would be a recommended autoCommit and autoSoftCommit interval ?
> > > 4) Any configurations that would help improve the restart time and avoid
> > > full replication?
> > > 5) Does Solr retain "numRecordsToKeep" number of  documents in tlog *per
> > > replica*?
> > > 6) The reasons for peersync from below logs are not completely clear to
> > me.
> > > Can someone please elaborate?
> > >
> > > *PeerSync fails with* :
> > >
> > > Failure type 1:
> > > -----------------
> > > 2019-02-04 20:43:50.018 INFO
> > > (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > > org.apache.solr.update.PeerSync Fingerprint comparison: 1
> > >
> > > 2019-02-04 20:43:50.018 INFO
> > > (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > > org.apache.solr.update.PeerSync Other fingerprint:
> > > {maxVersionSpecified=1624579878580912128,
> > > maxVersionEncountered=1624579893816721408, maxInHash=1624579878580912128,
> > > versionsHash=-8308981502886241345, numVersions=32966082,
> > numDocs=32966165,
> > > maxDoc=1828452}, Our fingerprint:
> > {maxVersionSpecified=1624579878580912128,
> > > maxVersionEncountered=1624579975760838656, maxInHash=1624579878580912128,
> > > versionsHash=4017509388564167234, numVersions=32966066, numDocs=32966165,
> > > maxDoc=1828452}
> > >
> > > 2019-02-04 20:43:50.018 INFO
> > > (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > > org.apache.solr.update.PeerSync PeerSync:
> > > core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > url=
> > > http://indexnode1:8983/solr DONE. sync failed
> > >
> > > 2019-02-04 20:43:50.018 INFO
> > > (recoveryExecutor-4-thread-2-processing-n:indexnode1:8983_solr
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > > org.apache.solr.cloud.RecoveryStrategy PeerSync Recovery was not
> > successful
> > > - trying replication.
> > >
> > >
> > > Failure type 2:
> > > ------------------
> > > 2019-02-02 20:26:56.256 WARN
> > > (recoveryExecutor-4-thread-11-processing-n:indexnode1:20000_solr
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
> > > s:shard12 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node49)
> > > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard12 r:core_node49
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46]
> > > org.apache.solr.update.PeerSync PeerSync:
> > > core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
> > url=
> > > http://indexnode1:20000/solr too many updates received since start -
> > > startingUpdates no longer overlaps with our currentUpdates
> > >
> > >
> > > Thanks,
> > > Rahul
> >

Re: Full index replication upon service restart

Reply via email to