Yonik Seeley wrote:
> 
> *:* maps to MatchAllDocsQuery, which for each document needs to check
> if it's deleted (that's a synchronized call, and can be a bottleneck).
> 

Why does this need to check if documents are deleted if normal queries
don't? Is there any way of disabling this since I can be sure this isn't the
case after indexing and optimizing.


Yonik Seeley wrote:
> 
> A cheap workaround is that if you know of a term that is in every
> document, (or a field in every document that has very few terms), then
> substitute a query on that for *:*
> Substituting one of your filters as the base query might also work.
> 

Would duplicating one of my filters cause any issues? That would be easy.
Otherwise I'll try the substitution and see if it helps much.


Yonik Seeley wrote:
> 
>> This surprises me because the filter query submitted has usually already
>> been submitted along with a normal query, and so should be cached in the
>> filter cache. Surely all solr needs to do is return a handful of fields
>> for
>> the first 100 records in the list from the cache - or so I thought.
> 
> To calculate the DocSet (the set of all documents matching *:* and
> your filters), Solr can just use it's caches as long as *:* and the
> filters have been used before.
> 
> *But*, to retrieve the top 10 documents matching *:* and your filters,
> the query must be re-run.  That is probably where the time is being
> spent.  Since you aren't looking for relevancy scores at all, but just
> faceting, it seems like we could potentially optimize this in Solr.
> 

I'm actually retrieving the first 100 in my tests, which will be necessary
in one of the two scenarios we use blank queries for. The other scenario
doesn't require any docs at all - just the facets, and I've not put that in
my tests. What would the situation be if I specified a sort order for the
facets and/or retrieved no docs at all? I'd be sorting the facets
alphabetically, which is currently done by my app rather than the search
engine. (since I sometimes have to merge facets from more than one field)

I had assumed that no doc would be considered more relevant than any other
without any query terms - i.e. filter query terms wouldn't affect relevance.
This seems sensible to me, but maybe that's only because our current search
engine works that way. 

Regarding optimization, I certainly think that being able to access all
facets for subsets of the indexed data (defined by the filter query) is an
incredibly useful feature. My search engine usage may not be very common
though. What it means to us is that we can drive all aspects of our sites
from the search engine, not just the obvious search forms.


Yonik Seeley wrote:
> 
> In the future, we could also do some query optimization by sometimes
> combining filters with the base query.
> 
> -Yonik
> 
> 

Sorry, that flew over my head..

Thanks very much for your help. I wish I had more time during this
evaluation to delve into the code. I don't suppose there's a document with
guided tour of the codebase anywhere is there? ;-)


P.S. I re-ran my tests without returning facets whilst writing this and
didn't get the slowdowns with 4 or 10 threads, does this help? 


-- 
View this message in context: 
http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15209605.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to