Well, in M/S setups the master shouldn't be searching at all,
but that's a nit.

That aside, whether the master has opened a new or
searcher or not is irrelevant to what the slave replicates.
What _is_ relevant is whether any of the files on disk that
comprise the index (i.e. the segment files) have been
changed. Really, if any of them have been closed/merged
whatever since the last sync. Imagine it like this (this isn't
quite what happens, but it's a useful model). The slave
says "here's a list of my segments, is it the same as the
list of closed segments on the master?" If the answer
is no, a replication is performed. Actually, this is done
much more efficiently, but that's the idea.

You seem to be really asking about the whole issue of whether
searches on the various nodes (master + slaves) is
consistent. This is one of the problems with M/S setups, they
can be different by whatever has happened in the polling interval.

The state of the master's searchers just doesn't enter the picture.

Glad the problem is solved no matter what.

Erick

On Sat, Mar 1, 2014 at 10:26 PM, Arun Rangarajan
<arunrangara...@gmail.com> wrote:
>> The slave is polling the master after the interval specified in
> solrconfig.xml. The slave essentially asks "has anything changed?" If so, the
> changes are brought down to the slave.
> Yes, I understand this, but if master does not open a new searcher after
> auto commits (which would indicate that the new index is not quite ready
> yet) and if master is still using the old index to serve search requests, I
> would expect the slave to do the same as well. Or the slave should at least
> not replicate or not open a new searcher, until the master opened a new
> searcher. But that is just the way I see it and it may be wrong.
>
>> What's your polling interval on the slave anyway? Sounds like it's quite
> frequent if you notice this immediately after the DIH starts.
> No, polling interval is set to 1 hour, but the full import was set to run
> at 1 AM. I believe a delete followed by few docs got replicated after the
> first few auto commits when the slave probably polled around 1:10 AM and
> slave index had few docs for an hour before the next polling happened,
> which is why the date query was returning empty results for exactly that
> one hour. (The full index takes about 1.5 hours to finish.)
>
> Anyway the problem is now solved by specifying "clean=false" in the DIH
> full import command.
>
>
> On Sat, Mar 1, 2014 at 9:12 AM, Erick Erickson <erickerick...@gmail.com>wrote:
>
>> bq: the slave anyway replicates the index after auto commits! (Is this
>> desired behavior?)
>>
>> Absolutely it's desired behavior. The slave is polling the master
>> after the interval
>> specified in solrconfig.xml. The slave essentially asks "has anything
>> changed?" If so,
>> the changes are brought down to the slave. And by definition, commits
>> change the index,
>> especially if all docs have been deleted....
>>
>> What's your polling interval on the slave anyway? Sounds like it's
>> quite frequent if you
>> notice this immediately after the DIH starts.
>>
>> Best,
>> Erick
>>
>> On Fri, Feb 28, 2014 at 9:04 PM, Arun Rangarajan
>> <arunrangara...@gmail.com> wrote:
>> > I believe I figured out what the issue is. Even though we do not open a
>> new
>> > searcher on master during full import, the slave anyway replicates the
>> > index after auto commits! (Is this desired behavior?) Since "clean=true"
>> > this meant all the docs were deleted on slave and a partial index got
>> > replicated! The reason only the date query did not return any results is
>> > because recently created docs have higher doc IDs and we index by
>> ascending
>> > order of IDs!
>> >
>> > I believe I have two options:
>> > - as Chris suggested I have to use "clean=false" so the existing docs are
>> > not deleted first on the slave. Since we have primary keys, newly added
>> > docs will overwrite old docs as they get added.
>> > - disable replication after commits. Replicate only after optimize.
>> >
>> > Thx all for your help.
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Feb 28, 2014 at 8:06 PM, Arun Rangarajan
>> > <arunrangara...@gmail.com>wrote:
>> >
>> >> Thx, Erick and Chris.
>> >>
>> >> This is indeed very strange. Other queries which do not restrict by the
>> >> date field are returning results, so the index is definitely not empty.
>> Has
>> >> it got something to do with the date query part, with NOW/DAY or
>> something
>> >> in here?
>> >> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]
>> >>
>> >> For now, I have set up a script to just log the number of docs on the
>> >> slave every minute. Will monitor and report the findings.
>> >>
>> >>
>> >> On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter <
>> hossman_luc...@fucit.org
>> >> > wrote:
>> >>
>> >>>
>> >>> : This is odd. The full import, I think, deletes the
>> >>> : docs in the index when it starts.
>> >>>
>> >>> Yeah, if you are doing a full-import everyday, and you don't want it to
>> >>> delete all docs when it starts, you need to specify "clearn=false"
>> >>>
>> >>>
>> >>>
>> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
>> >>>
>> >>>
>> >>>
>> >>> -Hoss
>> >>> http://www.lucidworks.com/
>> >>>
>> >>
>> >>
>>

Reply via email to