On Jan 31, 2008 10:43 AM, Andy Blower <[EMAIL PROTECTED]> wrote: > > I'm evaluating SOLR/Lucene for our needs and currently looking at performance > since 99% of the functionality we're looking for is provided. The index > contains 18.4 Million records and is 58Gb in size. Most queries are > acceptably quick, once the filters are cached. The filters select one or > more of three subsets of the data and then intersect from around 15 other > subsets of data depending on a user subscription. > > We're returning facets on several fields, and sometimes a blank (q=*:*) > query is run purely to get the facets for all of the data that the user can > access. This information is turned into browse information and can be > different for each user. > > Running performance tests using jMeter sequentially with a single user, > these blank queries are slower than the normal queries, but still in the > 1-4sec range. Unfortunately if I increase the number of test threads so that > more than one of the blank queries is submitted while one is already being > processed, everything grinds to a halt and the responses to these blank > queries can take up to 125secs to be returned!
*:* maps to MatchAllDocsQuery, which for each document needs to check if it's deleted (that's a synchronized call, and can be a bottleneck). A cheap workaround is that if you know of a term that is in every document, (or a field in every document that has very few terms), then substitute a query on that for *:* Substituting one of your filters as the base query might also work. > This surprises me because the filter query submitted has usually already > been submitted along with a normal query, and so should be cached in the > filter cache. Surely all solr needs to do is return a handful of fields for > the first 100 records in the list from the cache - or so I thought. To calculate the DocSet (the set of all documents matching *:* and your filters), Solr can just use it's caches as long as *:* and the filters have been used before. *But*, to retrieve the top 10 documents matching *:* and your filters, the query must be re-run. That is probably where the time is being spent. Since you aren't looking for relevancy scores at all, but just faceting, it seems like we could potentially optimize this in Solr. In the future, we could also do some query optimization by sometimes combining filters with the base query. -Yonik