With time-oriented data, you can use an old trick (goes back to Infoseek in 
1995).

Make a “today” collection that is very fresh. Nightly, migrate new documents to 
the “not today” collection. The today collection will be small and can be 
updated
quickly. The archive collection will be large and slow to update, but who cares?

You can also send all docs to both collections and de-dupe.

Every night, you start over with the “today” collection.

Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 8, 2016, at 12:18 PM, Mike Lissner <mliss...@michaeljaylissner.com> 
> wrote:
> 
> On Fri, Oct 7, 2016 at 8:18 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> What you haven't mentioned is how often you add new docs. Is it once a
>> day? Steadily
>> from 8:00 to 17:00?
>> 
> 
> Alas, it's a steady trickle during business hours. We're ingesting court
> documents as they're posted on court websites, then sending alerts as soon
> as possible.
> 
> 
>> Whatever, your soft commit really should be longer than your autowarm
>> interval. Configure
>> autowarming to reference queries (firstSearcher or newSearcher events
>> or autowarm
>> counts in queryResultCache and filterCache. Say 16 in each of these
>> latter for a start) such
>> that they cause the external file to load. That _should_ prevent any
>> queries from being
>> blocked since the autowarming will happen in the background and while
>> it's happening
>> incoming queries will be served by the old searcher.
>> 
> 
> I want to make sure I understand this properly and document this for future
> people that may find this thread. Here's what I interpret your advice to be:
> 
> 0. Slacken my auto soft commit interval to something more like a minute.
> 
> 1. Set up a query in the newSearcher listener that uses my external file
> field.
> 1a. Do the same in firstSearcher if I want newly started solr to warm up
> before getting queries (this doesn't matter to me, so I'm skipping this).
> 
> and/or
> 
> 2. Set autowarmcount in queryResultCache and filterCache to 16 so that the
> top 16 query results from the previous searcher are regenerated in the new
> searcher.
> 
> Doing #1 seems like a safe strategy since it's guaranteed to hit the
> external file field. #2 feels like a bonus.
> 
> I'm a bit confused about the example autowarmcount for the caches, which is
> 0. Why not set this to something higher? I guess it's a RAM utilization vs.
> speed tradeoff? A low number like 16 seems like it'd have minimal impact on
> RAM?
> 
> Thanks for all the great replies and for everything you do for Solr. I
> truly appreciate your efforts.
> 
> Mike

Reply via email to