I actually built in the before/after hooks so we can disable/enable a
node from the cluster while its replicating. When the machine was
copying over 20gigs and serving requests the load spiked tremendously.
It was easy enough to create a sort of rolling replication... ie,
1) node 1 removes health-check file, replicates then goes back up
2) node 2 removes health-check file, replicates then goes back up,
...
Which listener gets called after replication... im guessing newSearcher?
Thanks for you help
On 12/8/10 10:18 AM, Erick Erickson wrote:
Perhaps the tricky part here is that Solr makes it's caches for #parts# of
the query. In other words, a query that sorts on field A will populate
the cache for field A. Any other query that sorts on field A will use the
same cache. So you really need just enough queries to populate, in this
case, the fields you'll sort by. One could put together multiple sorts on a
single query and populate the sort caches all at once if you wanted.
Similarly for faceting and filter queries. You might well be able to make
just a few queries that filled up all the relevant caches rather than the
using 100s, but you know your schema way better than I do.
What I meant about replicating work is that trying to use your after
hook to fire off the queries probably doesn't buy you anything
over firstSearcher/newSearcher lists.
All that said, though, if you really don't want to put your queries in
the config file, it would be relatively trivial to write a small Java app
that uses SolrJ to query the server, reading the queries from
anyplace you chose and call it from the after hook. Personally, I
think this is a high-cost option when compared to having the list
in the config file due to the added complexity, but that's your
call.
Best
Erick
On Wed, Dec 8, 2010 at 12:25 PM, Mark<static.void....@gmail.com> wrote:
We only replicate twice an hour so we are far from real-time indexing. Our
application never writes to master rather we just pick up all changes using
updated_at timestamps when delta-importing using DIH.
We don't have any warming queries in firstSearcher/newSearcher event
listeners. My initial post was asking how I would go about doing this with a
large number of queries. Our queries themselves tend to have a lot of
faceting and other restrictions on them so I would rather not list them all
out using xml. I was hoping there was some sort of log replayer handler or
class that would replay a bunch of queries while the node is offline. When
its done, it will bring the node back online ready to serve requests.
On 12/8/10 6:15 AM, Jonathan Rochkind wrote:
How often do you replicate? Do you know how long your warming queries take
to complete?
As others in this thread have mentioned, if your replications (or ordinary
commits, if you weren't using replication) happen quicker than warming takes
to complete, you can get overlapping indexes being warmed up, and run out of
RAM (causing garbage collection to take lots of CPU, if not an out-of-memory
error), or otherwise block on CPU with lots of new indexes being warmed at
once.
Solr is not very good at providing 'real time indexing' for this reason,
although I believe there are some features in post-1.4 trunk meant to
support 'near real time search' better.
________________________________________
From: Mark [static.void....@gmail.com]
Sent: Tuesday, December 07, 2010 10:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Warming searchers/Caching
Maybe I should explain my problem a little more in detail.
The problem we are experiencing is after a delta-import we notice a
extremely high load time on the slave machines that just replicated. It
goes away after a min or so production traffic once everything is cached.
I already have a before/after hook that is in place before/after
replication takes place. The before hook removes the slave from the
cluster and then starts to replicate. When its done it calls the after
hook and I would like to warm up the cache in this method so no users
experience extremely long wait times.
On 12/7/10 4:22 PM, Markus Jelsma wrote:
XInclude works fine but that's not what your looking for i guess. Having
the
100 top queries is overkill anyway and it can take too long for a new
searcher
to warmup.
Depending on the type of requests, i usually tend to limit warming to
popular
filter queries only as they generate a very high hit ratio at make
caching
useful [1].
If there are very popular user entered queries having a high initial
latency,
i'd have them warmed up as well.
[1]: http://wiki.apache.org/solr/SolrCaching#Tradeoffs
Warning: I haven't used this personally, but Xinclude looks like what
you're after, see: http://wiki.apache.org/solr/SolrConfigXml#XInclude
Best
Erick
On Tue, Dec 7, 2010 at 6:33 PM, Mark<static.void....@gmail.com>
wrote:
Is there any plugin or easy way to auto-warm/cache a new searcher with
a
bunch of searches read from a file? I know this can be accomplished
using
the EventListeners (newSearcher, firstSearcher) but I rather not add
100+
queries to my solrconfig.xml.
If there is no hook/listener available, is there some sort of Handler
that performs this sort of function? Thanks!