firstSearcher cache warming with own QuerySenderListener

2015-09-25 Thread Christian Reuschling
Hey all,

we want to avoid cold start performance issues when the caches are cleared 
after a server restart.

For this, we have written a SearchComponent that saves least recently used 
queries. These are
written to a file inside a closeHook of a SolrCoreAware at server shutdown.

The plan is to perform these queries at server startup to warm up the caches. 
For this, we have
written a derivative of the QuerySenderListener and configured it as 
firstSearcher listener in
solrconfig.xml. The only difference to the origin QuerySenderListener is that 
it gets it's queries
from the formerly dumped lru queries rather than getting them from the config 
file.

It seems that everything is called correctly, and we have the impression that 
the query response
times for the dumped queries are sometimes slightly better than without this 
warming.

Nevertheless, there is still a huge difference against the times when we 
manually perform the same
queries once, e.g. from a browser. If we do this, the second time we perform 
these queries they
respond much faster (up to 10 times) than the response times after the 
implemented warming.

It seems that not all caches are warmed up during our warming. And because of 
these huge
differences, I doubt we missed something.

The index has about 25M documents, and is splitted into two shards in a cloud 
configuration, both
shards are on the same server instance for now, for testing purposes.

Does anybody have an idea? I tried to disable lazy field loading as a potential 
issue, but with no
success.


Cheers,

Christian



Re: firstSearcher cache warming with own QuerySenderListener

2015-09-28 Thread Christian Reuschling
Erick, Walter and all,

as I wrote, I am aware of the firstSearcher event, we tried it manually before 
we choosed to enhance
the QuerySenderListener.

I think our usage scenario (I didn't wrote about it for simplicity) is a bit 
different from yours,
what makes this necessary. We are implementing an own solr searchHandler module 
that fires several
queries on its own, and our customer implements against this searchHandler, 
giving some 'seed
queries', which leads to further queries (dozens to hundreds).

I assume there will be a set of typical or common queries, but according to the 
more heterogeneous
nature of the final queries that arrives the cores, it is bigger and hard to 
find out manually.

The server is restarted so often because the searchHandler is still under 
development. And because
one customer query yields to so much queries executed, non-warmed caches makes 
a big difference.

I will formulate my query different, it's the same thing:

I have inserted a query for warming inside firstSearcher event solrconfig.xml. 
If I call it from the
browser once, the response time still is much bigger against the ones from 
succeeding invocations,
which gives the impression that some caches are not filled. Here is my query 
(with an mlt query parser):

http://localhost:8014/solr/etrCollection/select?q=+%28+%28dynaqCategory:brandwatch%29%29%20_query_:%27{!mlt%20qf=%22body%22%20v=%22http://www.usatoday.com/story/news/nation/2013/02/14/drought-farmers-midwest/1920577/%22}&rows=10&fl=dataEntityId,title,creator,score&wt=json



thanks again,

Christian



Walter wrote:

> Right.
>
> I chose the twenty most frequent terms from our documents and use those for 
> cache warming.
> The list of most frequent terms is pretty stable in most collections.


Erick wrote:

> That's what the firstSearcher event in solrconfig.xml is for, exactly the
> case of autowarming Solr when it's just been started. The queries you put
> in that event are fired only when the server starts.
>
> So I'd just put my queries there. And you do not have to put a zillion
> queries here. Start with one that mentions all the facets you intend to
> use, sorts by all the various sort fields you use, perhaps (if you have any
> _very_ common filter queries) put those in too.
>
> Then analyze the queries that are still slow when issued the first time
> after startup and add what you suspect are the relevant bits to the
> firstSearcher query (or queries).
>
> I suggest that this is a much easier thing to do, and focus efforts on why
> you are shutting down your Solr servers often enough that anyone notices..
>
> Best,
> Erick




On 25.09.2015 17:31, Christian Reuschling wrote:
> Hey all,
> 
> we want to avoid cold start performance issues when the caches are cleared 
> after a server restart.
> 
> For this, we have written a SearchComponent that saves least recently used 
> queries. These are
> written to a file inside a closeHook of a SolrCoreAware at server shutdown.
> 
> The plan is to perform these queries at server startup to warm up the caches. 
> For this, we have
> written a derivative of the QuerySenderListener and configured it as 
> firstSearcher listener in
> solrconfig.xml. The only difference to the origin QuerySenderListener is that 
> it gets it's queries
> from the formerly dumped lru queries rather than getting them from the config 
> file.
> 
> It seems that everything is called correctly, and we have the impression that 
> the query response
> times for the dumped queries are sometimes slightly better than without this 
> warming.
> 
> Nevertheless, there is still a huge difference against the times when we 
> manually perform the same
> queries once, e.g. from a browser. If we do this, the second time we perform 
> these queries they
> respond much faster (up to 10 times) than the response times after the 
> implemented warming.
> 
> It seems that not all caches are warmed up during our warming. And because of 
> these huge
> differences, I doubt we missed something.
> 
> The index has about 25M documents, and is splitted into two shards in a cloud 
> configuration, both
> shards are on the same server instance for now, for testing purposes.
> 
> Does anybody have an idea? I tried to disable lazy field loading as a 
> potential issue, but with no
> success.
> 
> 
> Cheers,
> 
> Christian
> 


result grouping on all documents

2015-10-20 Thread Christian Reuschling
Hi,

we try to get the number of documents for given time slots in the index 
efficiently.


For this, we query the solr index like this:

http://localhost:8014/solr/myCore/query?q=*:*&rows=1&fl=id&group=true&group.query=modified:[201103010%20TO%20201302010]&group.query=modified:[201303010%20TO%20201502010]&group.limit=1&distrib=false

for now, the modified field is a number field with trie index (tlong in 
schema.xml).

We have about 30M documents in the index.

This query works fine, but if the number of group queries gets higher (e.g. 
200), the response time
gets terribly slow.
As we need only the number of documents per group and never the score, or some 
other data of the
documents, we are wondering if there is a faster method to get this information.


Thanks

Christian



Generating json response in custom requestHandler (xml is working)

2015-04-02 Thread Christian Reuschling
Hi,

I managed it to create a small custom requestHandler, and filled the response 
parameter with  some
static values in the structure I want to have later.

I can invoke the requestHander from the browser and get nicely xml with the 
data and structure I
had specified - so far so good. Here is the xml response:




0
17


13.0
14.0
15
16.0
17.0


13.0
14.0
15
16.0
17.0

id1
id2
id3
id4



13.0
14.0
15
16.0
17.0

id1
id2
id3
id4







Now I simply add &wt=json to the invocation. Sadly I get a

HTTP ERROR 404

Problem accessing /solr/etr_base_core/trends&wt=json. Reason:

Not Found


I had the feeling that the response format is transparent for me when I write a 
custom
requestHandler. But it seems I've overseen something.

Does anybody have an idea?


Regards

Christian