Re: Always use leader for searching queries

Novin Novin Tue, 09 Jan 2018 04:09:08 -0800

Hi Erick,

Apology for delay.


[This isn't what I meant. I meant to query each replica directly
_within_ the same shard. Your problem statement is that the leader and
replicas (I use "followers") have different document counts. How are
you verifying this? Through the admin UI? Using &distrib=false is
useful when you want to query each core directly (and you have to use
the core name) in some automated fashion.]

I might be wrong here because now I can't produce it with distrib=false

I also did as you said
[OK, I'm assuming then that you issue a manual commit sometime, right?
Here's what I'd do:
1> turn off indexing
2> issue a commit (soft or hard-with-opensearcher-true)
3> now look at your doc counts on each replica.]

Everything is seems ok now, I must have doing something wrong before.

Thanks for all yours and walter's  help
Best,
Navin


On Wed, 3 Jan 2018 at 17:09 Walter Underwood <wun...@wunderwood.org> wrote:

> If you have a field for the indexed datetime, you can use a filter query
> to get rid of recent updates that might be in transit. I’d use double the
> autocommit time, to leave time for the followers to index.
>
> If the autocommit interval is one minute:
>
> fq=indexed_datetime:[* TO NOW-2MIN]
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jan 3, 2018, at 8:58 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >
> > [I probably not need to do this because I have only one shard but I did
> > anyway count was different.]
> >
> > This isn't what I meant. I meant to query each replica directly
> > _within_ the same shard. Your problem statement is that the leader and
> > replicas (I use "followers") have different document counts. How are
> > you verifying this? Through the admin UI? Using &distrib=false is
> > useful when you want to query each core directly (and you have to use
> > the core name) in some automated fashion.
> >
> > [I have actually turned off auto soft commit for a time being but
> > nothing changed]
> >
> > OK, I'm assuming then that you issue a manual commit sometime, right?
> > Here's what I'd do:
> > 1> turn off indexing
> > 2> issue a commit (soft or hard-with-opensearcher-true)
> > 3> now look at your doc counts on each replica.
> >
> > If the counts are different then something's not right, Solr tries
> > very hard to not lose data, it's concerning if the leader and replicas
> > have different counts.
> >
> > Best,
> > Erick
> >
> > On Wed, Jan 3, 2018 at 1:51 AM, Novin Novin <toe.al...@gmail.com> wrote:
> >> Hi Erick,
> >>
> >> Thanks for your reply.
> >>
> >> [ First of all, replicas can be off in terms of counts for the soft
> >> commit interval. The commits don't all happen on the replicas at the
> >> same wall-clock time. Solr promises eventual consistency, in this case
> >> NOW-autocommit time.]
> >>
> >> I realized that, to stop it. I have actually turned off auto soft commit
> >> for a time being but nothing changed. Non leader replica still had extra
> >> documents.
> >>
> >> [ So my first question is whether the replicas in the shard are
> >> inconsistent as of, say, NOW-your_soft_commit_time. I'd add a fudge
> >> factor of 10 seconds earlier just to be sure I was past autowarming.
> >> This does require that there be a time stamp. Absent a timestamp, you
> >> could suspend indexing for a few minutes and run the test like below.]
> >>
> >> When data was indexing at that time I was checking how the counts are in
> >> both replica. What I found leader replica has 3 doc less than other
> replica
> >> always. I don't think so they were of by NOW-soft_commit_time,
> CloudSolrClient
> >> add some thing like this "_stateVer_=main:114" in query which I assume
> is
> >> for results to be consistent between both replica search.
> >>
> >> [Adding &distrib=false to your command and directing it at a specific
> >> _core_ (something like collection1_shard1_replica1) will only return
> >> data from that core.]
> >> I probably not need to do this because I have only one shard but I did
> >> anyway count was different.
> >>
> >> [When you say you index every minute, I'm guessing you only index for
> >> part of that minute, is that true? In that case you might get more
> >> consistency if, instead of relying totally on your autoconfig
> >> settings, specify commitWithin on your update command. That should
> >> force the commits to happen more closely in-sync, although still not
> >> perfect.]
> >>
> >> We receive data every minute, so whenever we have new data we send it to
> >> Solr cloud using queue. You said don't rely on auto config. Do you mean
> I
> >> should turn off autocommit and use commitWithin using solrj or leave
> >> autoCommit as it is and also use commitWithin from solrj client.
> >>
> >> I apologize If I am not clear, thanks for your help again.
> >>
> >> Thanks in advance,
> >> Navin
> >>
> >>
> >>
> >>
> >>
> >> On Tue, 2 Jan 2018 at 18:05 Erick Erickson <erickerick...@gmail.com>
> wrote:
> >>
> >>> First of all, replicas can be off in terms of counts for the soft
> >>> commit interval. The commits don't all happen on the replicas at the
> >>> same wall-clock time. Solr promises eventual consistency, in this case
> >>> NOW-autocommit time.
> >>>
> >>> So my first question is whether the replicas in the shard are
> >>> inconsistent as of, say, NOW-your_soft_commit_time. I'd add a fudge
> >>> factor of 10 seconds earlier just to be sure I was past autowarming.
> >>> This does require that there be a time stamp. Absent a timestamp, you
> >>> could suspend indexing for a few minutes and run the test like below.
> >>>
> >>> Adding &distrib=false to your command and directing it at a specific
> >>> _core_ (something like collection1_shard1_replica1) will only return
> >>> data from that core.
> >>>
> >>> When you say you index every minute, I'm guessing you only index for
> >>> part of that minute, is that true? In that case you might get more
> >>> consistency if, instead of relying totally on your autoconfig
> >>> settings, specify commitWithin on your update command. That should
> >>> force the commits to happen more closely in-sync, although still not
> >>> perfect.
> >>>
> >>> Another option if you're totally and completely sure that your commits
> >>> happen _only_ from your indexing program is to fire the commit at the
> >>> end of the run from your SolrJ program.
> >>>
> >>> Let us know,
> >>> Erick
> >>>
> >>> On Tue, Jan 2, 2018 at 9:33 AM, Novin Novin <toe.al...@gmail.com>
> wrote:
> >>>> Hi Erick,
> >>>>
> >>>> You are right, it is XY Problem.
> >>>>
> >>>> Allow me to explain best I can, I have two replica of one collection
> >>> called
> >>>> "Main". When I was using search feature in my application I get two
> >>>> different numFound count. So I start digging after spending 2 3 hours
> I
> >>>> found the one replica has numFound count higher than other (higher
> count
> >>>> was not leader). I am not sure how It got end up like that. This count
> >>>> difference affects paging on my application side not solr side.
> >>>>
> >>>> Extra info might be useful to know
> >>>> Same query not a single letter difference.
> >>>> auto soft commit 20000
> >>>> soft commit 60000
> >>>> indexing data every minute.
> >>>>
> >>>> Let me know if you need to know anything else. Any help would highly
> >>>> appreciated.
> >>>>
> >>>> Thanks in advance,
> >>>> Navin
> >>>>
> >>>>
> >>>>
> >>>> On Tue, 2 Jan 2018 at 15:14 Erick Erickson <erickerick...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> This seems like an XY problem. You're asking how to do X
> >>>>> because you think it will solve problem Y without telling
> >>>>> us what Y is.
> >>>>>
> >>>>> I say this because on the surface this seems to defeat the
> >>>>> purpose behind SolrCloud. Why would you want to only make
> >>>>> use of one piece of hardware? That will limit your throughput,
> >>>>> so why bother to have replicas in the first place?
> >>>>>
> >>>>> Or is this some kind of diagnostic you're trying to implement?
> >>>>>
> >>>>> Best,
> >>>>> Erick
> >>>>>
> >>>>> On Tue, Jan 2, 2018 at 5:08 AM, Novin Novin <toe.al...@gmail.com>
> >>> wrote:
> >>>>>> Hi guys,
> >>>>>>
> >>>>>> I am using solr 5.5.4 and same version for solrj. My question is
> there
> >>>>> any
> >>>>>> way I can tell cloud solr client to use only leader for queries.
> >>>>>>
> >>>>>> Thanks in advance.
> >>>>>> Navin
> >>>>>
> >>>
>
>

Re: Always use leader for searching queries

Reply via email to