Re: Always use leader for searching queries

Novin Novin Tue, 09 Jan 2018 10:27:07 -0800

Thank you very much for all your help.

On Tue 9 Jan 2018, 16:32 Erick Erickson, <erickerick...@gmail.com> wrote:


> One thing to be aware of is that the commit points on the replicas in a
> replica may (will) fire at different times. So when you're comparing the
> number of docs on the replicas in a shard you have to compare before the
> last commit interval. So say you have a soft commit of 1 minute. When
> comparing the docs on each shard you need to restrict the query to things
> older than 1 minute or stop indexing and wait for 1 minute (i.e. until
> after the autocommit fires).
>
> Glad things worked out!
> Erick
>
> On Tue, Jan 9, 2018 at 4:08 AM, Novin Novin <toe.al...@gmail.com> wrote:
>
> > Hi Erick,
> >
> > Apology for delay.
> >
> > [This isn't what I meant. I meant to query each replica directly
> > _within_ the same shard. Your problem statement is that the leader and
> > replicas (I use "followers") have different document counts. How are
> > you verifying this? Through the admin UI? Using &distrib=false is
> > useful when you want to query each core directly (and you have to use
> > the core name) in some automated fashion.]
> >
> > I might be wrong here because now I can't produce it with distrib=false
> >
> > I also did as you said
> > [OK, I'm assuming then that you issue a manual commit sometime, right?
> > Here's what I'd do:
> > 1> turn off indexing
> > 2> issue a commit (soft or hard-with-opensearcher-true)
> > 3> now look at your doc counts on each replica.]
> >
> > Everything is seems ok now, I must have doing something wrong before.
> >
> > Thanks for all yours and walter's  help
> > Best,
> > Navin
> >
> >
> > On Wed, 3 Jan 2018 at 17:09 Walter Underwood <wun...@wunderwood.org>
> > wrote:
> >
> > > If you have a field for the indexed datetime, you can use a filter
> query
> > > to get rid of recent updates that might be in transit. I’d use double
> the
> > > autocommit time, to leave time for the followers to index.
> > >
> > > If the autocommit interval is one minute:
> > >
> > > fq=indexed_datetime:[* TO NOW-2MIN]
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > >
> > > > On Jan 3, 2018, at 8:58 AM, Erick Erickson <erickerick...@gmail.com>
> > > wrote:
> > > >
> > > > [I probably not need to do this because I have only one shard but I
> did
> > > > anyway count was different.]
> > > >
> > > > This isn't what I meant. I meant to query each replica directly
> > > > _within_ the same shard. Your problem statement is that the leader
> and
> > > > replicas (I use "followers") have different document counts. How are
> > > > you verifying this? Through the admin UI? Using &distrib=false is
> > > > useful when you want to query each core directly (and you have to use
> > > > the core name) in some automated fashion.
> > > >
> > > > [I have actually turned off auto soft commit for a time being but
> > > > nothing changed]
> > > >
> > > > OK, I'm assuming then that you issue a manual commit sometime, right?
> > > > Here's what I'd do:
> > > > 1> turn off indexing
> > > > 2> issue a commit (soft or hard-with-opensearcher-true)
> > > > 3> now look at your doc counts on each replica.
> > > >
> > > > If the counts are different then something's not right, Solr tries
> > > > very hard to not lose data, it's concerning if the leader and
> replicas
> > > > have different counts.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Wed, Jan 3, 2018 at 1:51 AM, Novin Novin <toe.al...@gmail.com>
> > wrote:
> > > >> Hi Erick,
> > > >>
> > > >> Thanks for your reply.
> > > >>
> > > >> [ First of all, replicas can be off in terms of counts for the soft
> > > >> commit interval. The commits don't all happen on the replicas at the
> > > >> same wall-clock time. Solr promises eventual consistency, in this
> case
> > > >> NOW-autocommit time.]
> > > >>
> > > >> I realized that, to stop it. I have actually turned off auto soft
> > commit
> > > >> for a time being but nothing changed. Non leader replica still had
> > extra
> > > >> documents.
> > > >>
> > > >> [ So my first question is whether the replicas in the shard are
> > > >> inconsistent as of, say, NOW-your_soft_commit_time. I'd add a fudge
> > > >> factor of 10 seconds earlier just to be sure I was past autowarming.
> > > >> This does require that there be a time stamp. Absent a timestamp,
> you
> > > >> could suspend indexing for a few minutes and run the test like
> below.]
> > > >>
> > > >> When data was indexing at that time I was checking how the counts
> are
> > in
> > > >> both replica. What I found leader replica has 3 doc less than other
> > > replica
> > > >> always. I don't think so they were of by NOW-soft_commit_time,
> > > CloudSolrClient
> > > >> add some thing like this "_stateVer_=main:114" in query which I
> assume
> > > is
> > > >> for results to be consistent between both replica search.
> > > >>
> > > >> [Adding &distrib=false to your command and directing it at a
> specific
> > > >> _core_ (something like collection1_shard1_replica1) will only return
> > > >> data from that core.]
> > > >> I probably not need to do this because I have only one shard but I
> did
> > > >> anyway count was different.
> > > >>
> > > >> [When you say you index every minute, I'm guessing you only index
> for
> > > >> part of that minute, is that true? In that case you might get more
> > > >> consistency if, instead of relying totally on your autoconfig
> > > >> settings, specify commitWithin on your update command. That should
> > > >> force the commits to happen more closely in-sync, although still not
> > > >> perfect.]
> > > >>
> > > >> We receive data every minute, so whenever we have new data we send
> it
> > to
> > > >> Solr cloud using queue. You said don't rely on auto config. Do you
> > mean
> > > I
> > > >> should turn off autocommit and use commitWithin using solrj or leave
> > > >> autoCommit as it is and also use commitWithin from solrj client.
> > > >>
> > > >> I apologize If I am not clear, thanks for your help again.
> > > >>
> > > >> Thanks in advance,
> > > >> Navin
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Tue, 2 Jan 2018 at 18:05 Erick Erickson <erickerick...@gmail.com
> >
> > > wrote:
> > > >>
> > > >>> First of all, replicas can be off in terms of counts for the soft
> > > >>> commit interval. The commits don't all happen on the replicas at
> the
> > > >>> same wall-clock time. Solr promises eventual consistency, in this
> > case
> > > >>> NOW-autocommit time.
> > > >>>
> > > >>> So my first question is whether the replicas in the shard are
> > > >>> inconsistent as of, say, NOW-your_soft_commit_time. I'd add a fudge
> > > >>> factor of 10 seconds earlier just to be sure I was past
> autowarming.
> > > >>> This does require that there be a time stamp. Absent a timestamp,
> you
> > > >>> could suspend indexing for a few minutes and run the test like
> below.
> > > >>>
> > > >>> Adding &distrib=false to your command and directing it at a
> specific
> > > >>> _core_ (something like collection1_shard1_replica1) will only
> return
> > > >>> data from that core.
> > > >>>
> > > >>> When you say you index every minute, I'm guessing you only index
> for
> > > >>> part of that minute, is that true? In that case you might get more
> > > >>> consistency if, instead of relying totally on your autoconfig
> > > >>> settings, specify commitWithin on your update command. That should
> > > >>> force the commits to happen more closely in-sync, although still
> not
> > > >>> perfect.
> > > >>>
> > > >>> Another option if you're totally and completely sure that your
> > commits
> > > >>> happen _only_ from your indexing program is to fire the commit at
> the
> > > >>> end of the run from your SolrJ program.
> > > >>>
> > > >>> Let us know,
> > > >>> Erick
> > > >>>
> > > >>> On Tue, Jan 2, 2018 at 9:33 AM, Novin Novin <toe.al...@gmail.com>
> > > wrote:
> > > >>>> Hi Erick,
> > > >>>>
> > > >>>> You are right, it is XY Problem.
> > > >>>>
> > > >>>> Allow me to explain best I can, I have two replica of one
> collection
> > > >>> called
> > > >>>> "Main". When I was using search feature in my application I get
> two
> > > >>>> different numFound count. So I start digging after spending 2 3
> > hours
> > > I
> > > >>>> found the one replica has numFound count higher than other (higher
> > > count
> > > >>>> was not leader). I am not sure how It got end up like that. This
> > count
> > > >>>> difference affects paging on my application side not solr side.
> > > >>>>
> > > >>>> Extra info might be useful to know
> > > >>>> Same query not a single letter difference.
> > > >>>> auto soft commit 20000
> > > >>>> soft commit 60000
> > > >>>> indexing data every minute.
> > > >>>>
> > > >>>> Let me know if you need to know anything else. Any help would
> highly
> > > >>>> appreciated.
> > > >>>>
> > > >>>> Thanks in advance,
> > > >>>> Navin
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Tue, 2 Jan 2018 at 15:14 Erick Erickson <
> erickerick...@gmail.com
> > >
> > > >>> wrote:
> > > >>>>
> > > >>>>> This seems like an XY problem. You're asking how to do X
> > > >>>>> because you think it will solve problem Y without telling
> > > >>>>> us what Y is.
> > > >>>>>
> > > >>>>> I say this because on the surface this seems to defeat the
> > > >>>>> purpose behind SolrCloud. Why would you want to only make
> > > >>>>> use of one piece of hardware? That will limit your throughput,
> > > >>>>> so why bother to have replicas in the first place?
> > > >>>>>
> > > >>>>> Or is this some kind of diagnostic you're trying to implement?
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Erick
> > > >>>>>
> > > >>>>> On Tue, Jan 2, 2018 at 5:08 AM, Novin Novin <toe.al...@gmail.com
> >
> > > >>> wrote:
> > > >>>>>> Hi guys,
> > > >>>>>>
> > > >>>>>> I am using solr 5.5.4 and same version for solrj. My question is
> > > there
> > > >>>>> any
> > > >>>>>> way I can tell cloud solr client to use only leader for queries.
> > > >>>>>>
> > > >>>>>> Thanks in advance.
> > > >>>>>> Navin
> > > >>>>>
> > > >>>
> > >
> > >
> >
>

Re: Always use leader for searching queries

Reply via email to