RE: Search query optimization

Chris Hostetter Tue, 17 Jun 2008 12:40:13 -0700

: test. If most of requests return the same set of data, cache will
: improve the query performance. But in our usage, almost all requests
: have different data set to return. The cache hit ratio is very low.


that's hwy i suggested moving clauses that are likely to be common (ie: 
your "within the last week" clause into a seperate fq param where it can 
be cached independently from the main query.  if you do that *and* you 
have the filterCache turned on then after this query...
  q=account:1&fq=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]
...these other queries will all be fairly fast becauseo f hte cache hit...
  q=account:9999&fq=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]
  q=account:7777&fq=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]
  q=anything+you+want&fq=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]

: my previous test examples, it seems lucene will not check the size of
: the subconditions (like account:1 or
: recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]). Q=account:1 will return a
: small set of data. But q=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY] will
: return a large set of data. If we combine them with "AND" like:
: q=account+AND+recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]. It should
: return the small set of data and then apply the subcondition
: "recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]". But from the response

the ConjunctionScorer will do that (as mentioned earlier in this thread) 
but even if the account:1 clause indicates that it can skip ahead to 
*document* #1234567, the ConstantScoreRangeQuery still 
needs iterate over all of the *terms* in the specified range before it 
knows which the lowest matching doc id is above #1234567.

that's why putting "range queries" into seperate "fq" params can be a lot 
better ... that term iteration only needs to be done once and can then be 
cached and reused.



-Hoss

RE: Search query optimization

Reply via email to