On Wed, Jan 18, 2012 at 10:15 PM, gabriel shen <xshco...@gmail.com> wrote:
> Hi Yonik,
>
> The index I am querying against is 20gb, containing 200,000documents, some
> of the documents are quite big, the schema contains more than 50 fields.
> Main content field are defined as both stored and indexed, applied
> htmlstripping, standardtokenization, decompounding, stemming filters,
> without termvector. The solr3.3 installation runs on top of jvm64 with 12gb
> memory. Default cache option(512) is applied.
>
> First I did a query with default query parser and a single query field
> called 'maintext',
> http://xxx:hhhh/solr/document/select?q=maintext:most%20populous%20city&start=0&rows=25
> It took 727 milliseconds in QueryComponent which is fine
>
> http://xxx:hhhh/solr/document/select?q=maintext:most%20populous%20city&start=0&rows=25&sort=sumlevel1%20asc,%20sumlevel2%20asc,%20domdate%20desc,%20score%20desc&facet=true&facet.field=sumlevel1
> It took 157 milliseconds in QueryComponent
>
>
> And then I did the the another dismax query with the same query keywords(I
> suppose most documents, sorting, filtering are being cached)
>
> http://xxx:hhhh/solr/document/select?q=most%20populous%20city&qt=dismax&start=0&rows=25&qf=superdocid^1000%20popular-name^1000%20author^100%20target-id^50%20title_simple^50%20title^25%20summary_simple^25%20summary^10%20maintext_simple^5%20annotation_DEF_simple^5%20maintext%20annotation_DEF&pf=popular-name^1000%20author^100%20title_simple^50%20title^25%20summary_simple^25%20summary^10%20maintext_simple^5%20annotation_DEF_simple^5%20maintext%20annotation_DEF&sort=sumlevel1%20asc,%20sumlevel2%20asc,%20domdate%20desc,%20score%20desc&facet=true&facet.field=sumlevel1&debugQuery=true
>
> It took more than 15-20 seconds before browser shows result, and it displays
> 4781 milliseconds in QueryComponent

A big discrepancy like this is pretty much always due to disk IO
reading the stored fields.
To support efficient streaming of large result lists, Solr reads the
stored fields and returns as it is streaming back the results to the
client.
If the operating system doesn't have enough free memory to cache the
index, each document will cause a disk seek + read.

A lack of memory could be slowing down the query component too (your
times look to slow for only 200K docs, even with querying that many
fields).

Some things you could try:
- a 12GB heap is pretty big for 200K documents, unless you facet and
sort on a ton of fields.  Try reducing the heap size of the JVM to
give more memory back to the operating system to cache index files.
- reduce the index size if possible
- run on a box with more physical memory if possible
- keep the index on local disk if possible (you didn't mention if it's
on some sort of NFS mount or something)

> then I cleaned browser cache and run the same dismax url again,
> It still will take 2500milliseonds in QueryComponent, and on the server
> machine, I only observed a  glance of cpu spike of 84%, and returned to 2%
> immediately  during the query.

Yep, sounds like disk IO.

-Yonik
http://www.lucidimagination.com

Reply via email to