On Dec 11, 2009, at 8:17 PM, Fer-Bj wrote: > > We're running a 14M documents index. For each document we have: > <field name="id" type="sint" indexed="true" > stored="true" > required="true" /> > <field name="title" type="text_ngram" indexed="true" > stored="true"omitNorms="true"/> > <field name="cat_id" type="sint" indexed="true" > stored="true"/> > <field name="geo_id" type="sint" indexed="true" > stored="true"/> > <field name="body" type="text" indexed="true" > stored="false" > omitNorms="true"/> > <field name="modified_datetime" type="date" indexed="true" > stored="true"/> > (and a few other fields). > > Our most usual query is something like this: > q=cat_id:xxx AND geo_id:yyyy&sort=id desc where cat_id = which "category" > (cars,sports,toys,etc) the item belongs to, and geo_id = which city/district > the item belongs to. > So this query will return a list of documents posted in category xxx, region > yyy. > Sorted by ID DESC, to get the newest first. > > There are 2 questions I'd like to ask: > > 1) adding something like: q=cat_id:xxx&fq=geo_id=yyyy would boost > performance?
For the n > 1 query, yes, adding filters should improve performance assuming it is selective enough. The tradeoff is memory. > > 2) we do find problems when we ask for a page=large offset! ie: > q=cat_id:xxx and geo_id:yyy&start=544545 > (note that we limit docs to 50 max per resultset). > When start is 500 or more, Qtime is >=5 seconds.... while the avg qtime is > <100 ms Yes, this is likely the case. Deep paging is not the typical use case, so what happens is you have more and more disk accesses, plus there is a whole bunch of priority queue stuff going on. See http://issues.apache.org/jira/browse/LUCENE-2127 > > Any help or tips would be appreciated! Do you really need "sortable ints" for all those fields? Are you doing range queries against them? The name "sortable" X is a bit of a misnomer. It doesn't mean sortable in the sense of the &sort parameter, it means sortable in the range query sense, as in cat_id:[55 TO 1005]. -Grant -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search