Erick,
Thanks a lot for the detailed explanation. That clarified things for me
better.


On Sun, Mar 2, 2014 at 10:04 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> Well, in M/S setups the master shouldn't be searching at all,
> but that's a nit.
>
> That aside, whether the master has opened a new or
> searcher or not is irrelevant to what the slave replicates.
> What _is_ relevant is whether any of the files on disk that
> comprise the index (i.e. the segment files) have been
> changed. Really, if any of them have been closed/merged
> whatever since the last sync. Imagine it like this (this isn't
> quite what happens, but it's a useful model). The slave
> says "here's a list of my segments, is it the same as the
> list of closed segments on the master?" If the answer
> is no, a replication is performed. Actually, this is done
> much more efficiently, but that's the idea.
>
> You seem to be really asking about the whole issue of whether
> searches on the various nodes (master + slaves) is
> consistent. This is one of the problems with M/S setups, they
> can be different by whatever has happened in the polling interval.
>
> The state of the master's searchers just doesn't enter the picture.
>
> Glad the problem is solved no matter what.
>
> Erick
>
> On Sat, Mar 1, 2014 at 10:26 PM, Arun Rangarajan
> <arunrangara...@gmail.com> wrote:
> >> The slave is polling the master after the interval specified in
> > solrconfig.xml. The slave essentially asks "has anything changed?" If
> so, the
> > changes are brought down to the slave.
> > Yes, I understand this, but if master does not open a new searcher after
> > auto commits (which would indicate that the new index is not quite ready
> > yet) and if master is still using the old index to serve search
> requests, I
> > would expect the slave to do the same as well. Or the slave should at
> least
> > not replicate or not open a new searcher, until the master opened a new
> > searcher. But that is just the way I see it and it may be wrong.
> >
> >> What's your polling interval on the slave anyway? Sounds like it's quite
> > frequent if you notice this immediately after the DIH starts.
> > No, polling interval is set to 1 hour, but the full import was set to run
> > at 1 AM. I believe a delete followed by few docs got replicated after the
> > first few auto commits when the slave probably polled around 1:10 AM and
> > slave index had few docs for an hour before the next polling happened,
> > which is why the date query was returning empty results for exactly that
> > one hour. (The full index takes about 1.5 hours to finish.)
> >
> > Anyway the problem is now solved by specifying "clean=false" in the DIH
> > full import command.
> >
> >
> > On Sat, Mar 1, 2014 at 9:12 AM, Erick Erickson <erickerick...@gmail.com
> >wrote:
> >
> >> bq: the slave anyway replicates the index after auto commits! (Is this
> >> desired behavior?)
> >>
> >> Absolutely it's desired behavior. The slave is polling the master
> >> after the interval
> >> specified in solrconfig.xml. The slave essentially asks "has anything
> >> changed?" If so,
> >> the changes are brought down to the slave. And by definition, commits
> >> change the index,
> >> especially if all docs have been deleted....
> >>
> >> What's your polling interval on the slave anyway? Sounds like it's
> >> quite frequent if you
> >> notice this immediately after the DIH starts.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Feb 28, 2014 at 9:04 PM, Arun Rangarajan
> >> <arunrangara...@gmail.com> wrote:
> >> > I believe I figured out what the issue is. Even though we do not open
> a
> >> new
> >> > searcher on master during full import, the slave anyway replicates the
> >> > index after auto commits! (Is this desired behavior?) Since
> "clean=true"
> >> > this meant all the docs were deleted on slave and a partial index got
> >> > replicated! The reason only the date query did not return any results
> is
> >> > because recently created docs have higher doc IDs and we index by
> >> ascending
> >> > order of IDs!
> >> >
> >> > I believe I have two options:
> >> > - as Chris suggested I have to use "clean=false" so the existing docs
> are
> >> > not deleted first on the slave. Since we have primary keys, newly
> added
> >> > docs will overwrite old docs as they get added.
> >> > - disable replication after commits. Replicate only after optimize.
> >> >
> >> > Thx all for your help.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Feb 28, 2014 at 8:06 PM, Arun Rangarajan
> >> > <arunrangara...@gmail.com>wrote:
> >> >
> >> >> Thx, Erick and Chris.
> >> >>
> >> >> This is indeed very strange. Other queries which do not restrict by
> the
> >> >> date field are returning results, so the index is definitely not
> empty.
> >> Has
> >> >> it got something to do with the date query part, with NOW/DAY or
> >> something
> >> >> in here?
> >> >> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]
> >> >>
> >> >> For now, I have set up a script to just log the number of docs on the
> >> >> slave every minute. Will monitor and report the findings.
> >> >>
> >> >>
> >> >> On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter <
> >> hossman_luc...@fucit.org
> >> >> > wrote:
> >> >>
> >> >>>
> >> >>> : This is odd. The full import, I think, deletes the
> >> >>> : docs in the index when it starts.
> >> >>>
> >> >>> Yeah, if you are doing a full-import everyday, and you don't want
> it to
> >> >>> delete all docs when it starts, you need to specify "clearn=false"
> >> >>>
> >> >>>
> >> >>>
> >>
> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
> >> >>>
> >> >>>
> >> >>>
> >> >>> -Hoss
> >> >>> http://www.lucidworks.com/
> >> >>>
> >> >>
> >> >>
> >>
>

Reply via email to