On Dec 11, 2009, at 8:17 PM, Fer-Bj wrote:

> 
> We're running a 14M documents index. For each document we have:
>   <field name="id"                    type="sint"     indexed="true"  
> stored="true"
> required="true" /> 
>   <field name="title"                         type="text_ngram" indexed="true"
> stored="true"omitNorms="true"/>
>   <field name="cat_id"                type="sint"     indexed="true"  
> stored="true"/>
>   <field name="geo_id"                type="sint"     indexed="true"  
> stored="true"/>
>   <field name="body"                  type="text"     indexed="true"  
> stored="false"
> omitNorms="true"/>
>   <field name="modified_datetime"     type="date"     indexed="true" 
> stored="true"/>
> (and a few other fields).
> 
> Our most usual query is something like this:
> q=cat_id:xxx AND geo_id:yyyy&sort=id desc   where cat_id = which "category"
> (cars,sports,toys,etc) the item belongs to, and geo_id = which city/district
> the item belongs to.
> So this query will return a list of documents posted in category xxx, region
> yyy. 
> Sorted by ID DESC, to get the newest first.
> 
> There are 2 questions I'd like to ask:
> 
> 1) adding something like:  q=cat_id:xxx&fq=geo_id=yyyy would boost
> performance?


For the n > 1 query, yes, adding filters should improve performance assuming it 
is selective enough.  The tradeoff is memory.

> 
> 2) we do find problems when we ask for a page=large offset!  ie: 
> q=cat_id:xxx and geo_id:yyy&start=544545
> (note that we limit docs to 50 max per resultset).
> When start is 500 or more, Qtime is >=5 seconds.... while the avg qtime is
> <100 ms

Yes, this is likely the case.  Deep paging is not the typical use case, so what 
happens is you have more and more disk accesses, plus there is a whole bunch of 
priority queue stuff going on.

See http://issues.apache.org/jira/browse/LUCENE-2127


> 
> Any help or tips would be appreciated!

Do you really need "sortable ints" for all those fields?  Are you doing range 
queries against them?  The name "sortable" X is a bit of a misnomer.  It 
doesn't mean sortable in the sense of the &sort parameter, it means sortable in 
the range query sense, as in cat_id:[55 TO 1005].

-Grant

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to