Re: Document Scoring

Erick Erickson Fri, 17 Jun 2011 08:51:00 -0700

I think this is the way to go. When trying to minimize latency, there are two
statistics to pay particular attention to on your #searchers#.


1> What is the warmup time for your caches?
2> What is your polling interval?

Make sure your polling interval is, say, at least three times longer than
your warmup interval when trying to minimize the latency. Also, set
<maxWarmingSearchers> to no more than two....

So the time between sending a document to the indexer and it being
available for search is at most the sum of

Time to next commit on the master
Polling interval on the slave
Time to replicate the changed part of the index
Warmup interval

As an aside, I've often found that, while product managers often say they
want "real time searching", explaining to them that "I can set up 5 minute
latency in 1 day, or program 20 second latency in XXX weeks" gives
them the information they need to decide how important "real time" really is!
Especially if you follow up with "and spending XXX weeks doing this
will mean that features A through F will not get into the release"....

Best
Erick

On Fri, Jun 17, 2011 at 11:24 AM, zarni aung <[email protected]> wrote:
> Thank you this is something that I wanted to hear.  I knew the design was
> most likely flawed because I have never done Solr or any kind of full text
> searching, but needed an unbiased opinion.  I think that if I were to tune
> the configs and pay close attention to the logs with lots of performance
> testing I might be able to achieve close to near real time (1-5 mins).  I've
> been reading this mailing list, Hathi Trust, Lucid Imagination and other
> sites for insights.
>
> Again Thank you.
>
> Zarni
>
> On Thu, Jun 16, 2011 at 9:49 PM, Erick Erickson 
> <[email protected]>wrote:
>
>> I really wouldn't go there, it sounds like there are endless
>> opportunities for errors!
>>
>> How "real-time" is "real-time"? Could you fix this entirely
>> by
>> 1> adjusting expectations for, say, 5 minutes.
>> 2> adjusting your commit (on the master) and poll (on the slave)
>> appropriately?
>>
>> Best
>> Erick
>>
>> On Thu, Jun 16, 2011 at 11:41 AM, zarni aung <[email protected]> wrote:
>> > Hi,
>> >
>> > I am designing my indexes to have 1 write-only master core, 2 read-only
>> > slave cores.  That means the read-only cores will only have snapshots
>> pulled
>> > from the master and will not have near real time changes.  I was thinking
>> > about adding a hybrid read and write master core that will have the most
>> > recent changes from my primary data source.  I am thinking to query the
>> > hybrid master and the read-only slaves and somehow try to intersect the
>> > results in order to support near real time full text search.  Is this
>> > feasible?
>> >
>> > Thank you,
>> >
>> > Zarni
>> >
>>
>

Re: Document Scoring

Reply via email to