Re: replication

Erick Erickson Wed, 11 Apr 2018 09:18:01 -0700

bq: are you simply flagging the fact that we wouldn't direct the queries to A
v. B v. C since SolrCloud will make the decisions itself as to which part
of the distro gets hit for the operation


Yep. SolrCloud takes care of it all itself. I should also add that there are
about a zillion metrics now available in Solr that you can use to make the
best use of hardware, including things like CPU usage, I/O, GC etc. SolrCloud
doesn't _yet_ make use of these but will in future. The current software LB
does a pretty simple round-robin distribution.

Best,
Erick

On Wed, Apr 11, 2018 at 5:57 AM, John Blythe <johnbly...@gmail.com> wrote:
> thanks, erick. great info.
>
> although you can't (yet) direct queries to one or the other. So just making
>> them all NRT and forgetting about it is reasonable.
>
>
> are you simply flagging the fact that we wouldn't direct the queries to A
> v. B v. C since SolrCloud will make the decisions itself as to which part
> of the distro gets hit for the operation? if not, can you expound on this a
> bit more?
>
> The very nature of merging is such that you will _always_ get large merges
>> until you have 5G segments (by default)
>
>
> bummer
>
> Quite possible, but you have to route things yourself. But in that case
>> you're limited to one machine to handle all your NRT traffic. I skimmed
>> your post so don't know whether your NRT traffic load is high enough to
>> worry about.
>
>
> ok. i think we'll take a two-pronged approach. for the immediate purposes
> of trying to solve an issue we've begun encountering we will begin
> thoroughtesting the load between various operations in the master-slave
> setup we've set up. pending the results, we can roll forward w a temporary
> patch in which all end-user touch points route through the primary box for
> read/write while large scale operations/processing we do in the background
> will point to the ELB the slaves are sitting behind. we'll also begin
> setting up a simple solrcloud instance to toy with per your suggestion
> above. inb4 tons more questions on my part :)
>
> thanks!
>
> --
> John Blythe
>
> On Tue, Apr 10, 2018 at 11:14 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> bq: should we try to bite the solrcloud bullet and be done w it
>>
>> that's what I'd do. As of 7.0 there are different "flavors", TLOG,
>> PULL and NRT so that's also a possibility, although you can't (yet)
>> direct queries to one or the other. So just making them all NRT and
>> forgetting about it is reasonable.
>>
>> bq:  is there some more config work we could put in place to avoid ...
>> commit issue and the ultra large merge dangers
>>
>> No. The very nature of merging is such that you will _always_ get
>> large merges until you have 5G segments (by default). The max segment
>> size (outside "optimize/forceMerge/expungeDeletes" which you shouldn't
>> do) is 5G so the steady-state worst-case segment pull is limited to
>> that.
>>
>> bq: maybe for our initial need we use Master for writing and user
>> access in NRT events, but slaves for the heavier backend
>>
>> Quite possible, but you have to route things yourself. But in that
>> case you're limited to one machine to handle all your NRT traffic. I
>> skimmed your post so don't know whether your NRT traffic load is high
>> enough to worry about.
>>
>> The very first thing I'd do is set up a simple SolrCloud setup and
>> give it a spin. Unless your indexing load is quite heavy, the added
>> work the NRT replicas have in SolrCloud isn't a problem so worrying
>> about that is premature optimization unless you have a heavy load.....
>>
>> Best,
>> Erick
>>
>> On Mon, Apr 9, 2018 at 4:36 PM, John Blythe <johnbly...@gmail.com> wrote:
>> > Thanks a bunch for the thorough reply, Shawn.
>> >
>> > Phew. We’d chosen to go w Master-slave replication instead of SolrCloud
>> per
>> > the sudden need we had encountered and the desire to avoid the nuances
>> and
>> > changes related to moving to SolrCloud. But so much for this being a more
>> > straightforward solution, huh?
>> >
>> > Few questions:
>> > - should we try to bite the solrcloud bullet and be done w it?
>> > - is there some more config work we could put in place to avoid the soft
>> > commit issue and the ultra large merge dangers, keeping the replications
>> > happening quickly?
>> > - maybe for our initial need we use Master for writing and user access in
>> > NRT events, but slaves for the heavier backend processing. Thoughts?
>> > - anyone do consulting on this that would be interested in chatting?
>> >
>> > Thanks again!
>> >
>> > On Mon, Apr 9, 2018 at 18:18 Shawn Heisey <apa...@elyograg.org> wrote:
>> >
>> >> On 4/9/2018 12:15 PM, John Blythe wrote:
>> >> > we're starting to dive into master/slave replication architecture.
>> we'll
>> >> > have 1 master w 4 slaves behind it. our app is NRT. if user performs
>> an
>> >> > action in section A's data they may choose to jump to section B which
>> >> will
>> >> > be dependent on having the updates from their action in section A. as
>> >> such,
>> >> > we're thinking that the replication time should be set to 1-2s (the
>> >> chances
>> >> > of them arriving at section B quickly enough to catch the 2s gap is
>> >> highly
>> >> > unlikely at best).
>> >>
>> >> Once you start talking about master-slave replication, my assumption is
>> >> that you're not running SolrCloud.  You would NOT want to try and mix
>> >> SolrCloud with replication.  The features do not play well together.
>> >> SolrCloud with NRT replicas (this is the only replica type that exists
>> >> in 6.x and earlier) may be a better option than master-slave
>> replication.
>> >>
>> >> > since the replicas will simply be looking for new files it seems like
>> >> this
>> >> > would be a lightweight operation even every couple seconds for 4
>> >> replicas.
>> >> > that said, i'm going *entirely* off of assumption at this point and
>> >> wanted
>> >> > to check in w you all to see any nuances, gotchas, hidden landmines,
>> etc.
>> >> > that we should be considering before rolling things out.
>> >>
>> >> Most of the time, you'd be correct to think that indexing is going to
>> >> create a new small segment and replication will have little work to do.
>> >> But as you create more and more segments, eventually Lucene is going to
>> >> start merging those segments.  For discussion purposes, I'm going to
>> >> describe a situation where each new segment during indexing is about
>> >> 100KB in size, and the merge policy is left at the default settings.
>> >> I'm also going to assume that no documents are getting deleted or
>> >> reindexed (which will delete the old version).  Deleted documents can
>> >> have an impact on merging, but it will usually only be a dramatic impact
>> >> if there are a LOT of deleted documents.
>> >>
>> >> The first ten segments created will be this 100KB size.  Then Lucene is
>> >> going to see that there are enough segments to trigger the merge policy
>> >> - it's going to combine ten of those segments into one that's
>> >> approximately one megabyte.  Repeat this ten times, and ten of those 1
>> >> megabyte segments will be combined into one ten megabyte segment.
>> >> Repeat all of THAT ten times, and there will be a 100 megabyte segment.
>> >> And there will eventually be another level creating 1 gigabyte
>> >> segments.  If the index is below 5GB in size, the entire thing *could*
>> >> be merged into one segment by this process.
>> >>
>> >> The end result of all this:  Replication is not always going to be
>> >> super-quick.  If merging creates a 1 gigabyte segment, then the amount
>> >> of time to transfer that new segment is going to depend on how fast your
>> >> disks are, and how fast your network is.  If you're using commodity SATA
>> >> drives in the 4 to 10 terabyte range and a gigabit network, the network
>> >> is probably going to be the bottleneck -- assuming that the system has
>> >> plenty of memory and isn't under a high load.  If the network is the
>> >> bottleneck in that situation, it's probably going to take close to ten
>> >> seconds to transfer a 1GB segment, and the greater part of a minute to
>> >> transfer a 5GB segment, which is the biggest one that the default merge
>> >> policy configuration will create without an optimize operation.
>> >>
>> >> Also, you should understand something that has come to my attention
>> >> recently (and is backed up by documentation):  If the master does a soft
>> >> commit and the segment that was committed remains in memory (not flushed
>> >> to disk), that segment will NOT be replicated to the slaves.  It has to
>> >> get flushed to disk before it can be replicated.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >> --
>> > John Blythe
>>

Re: replication

Reply via email to