Re: Funny behavior in facet query on large dataset

Shawn Heisey Mon, 08 Oct 2012 20:02:29 -0700

On 10/8/2012 4:09 PM, kevinlieb wrote:

Thanks for all the replies.


I oversimplified the problem for the purposes of making my post small and
concise.  I am really trying to find the counts of documents by a list of 10
different authors that match those keywords.  Of course on looking up a
single author there is no reason to do a facet query.  To be clearer:
Find all documents that contain the word "dude" or "thedude" or
"anotherdude" and count how many of these were written by "eldudearino" and
"zeedudearino" and "adudearino" and "beedudearino"

I tried facet.query as well as facet.method=fc and neither really helped.

We are constantly adding documents to the solr index and committing, every
few seconds, which is probably why this is not working well.

Seems we need to re-architect the way we are doing this...

I would definitely consider increasing the amount of time betweencommits. You can add documents at whatever interval you want, but ifyou only do commits every minute or two, your caches will be much moreuseful.

Your time slice filter query (NOW-5MINUTES) will never be cached,because NOW is measured in milliseconds and will therefore be differentfor every query. You might consider doing NOW/MINUTE-5MINUTES instead.. or even [NOW/MINUTE-5MINUTES TO *] so that you actually are dealingwith a range. For the space of that minute (at least until the cachegets invalidated by a commit), the filter cache entry will be valid.

Some general questions that may matter: How big are all your indexdirectories on this server, how much RAM is in the server, and how muchRAM are you giving to Java? I'm also curious how big your Solr cachesare, what the autowarm counts are, and how long it is taking for yourcaches to warm up after each commit. You can get the warm times fromthe cache statistics in the admin interface.


Thanks,
Shawn

Re: Funny behavior in facet query on large dataset

Reply via email to