On Wed, Jan 18, 2012 at 10:15 PM, gabriel shen <xshco...@gmail.com> wrote: > Hi Yonik, > > The index I am querying against is 20gb, containing 200,000documents, some > of the documents are quite big, the schema contains more than 50 fields. > Main content field are defined as both stored and indexed, applied > htmlstripping, standardtokenization, decompounding, stemming filters, > without termvector. The solr3.3 installation runs on top of jvm64 with 12gb > memory. Default cache option(512) is applied. > > First I did a query with default query parser and a single query field > called 'maintext', > http://xxx:hhhh/solr/document/select?q=maintext:most%20populous%20city&start=0&rows=25 > It took 727 milliseconds in QueryComponent which is fine > > http://xxx:hhhh/solr/document/select?q=maintext:most%20populous%20city&start=0&rows=25&sort=sumlevel1%20asc,%20sumlevel2%20asc,%20domdate%20desc,%20score%20desc&facet=true&facet.field=sumlevel1 > It took 157 milliseconds in QueryComponent > > > And then I did the the another dismax query with the same query keywords(I > suppose most documents, sorting, filtering are being cached) > > http://xxx:hhhh/solr/document/select?q=most%20populous%20city&qt=dismax&start=0&rows=25&qf=superdocid^1000%20popular-name^1000%20author^100%20target-id^50%20title_simple^50%20title^25%20summary_simple^25%20summary^10%20maintext_simple^5%20annotation_DEF_simple^5%20maintext%20annotation_DEF&pf=popular-name^1000%20author^100%20title_simple^50%20title^25%20summary_simple^25%20summary^10%20maintext_simple^5%20annotation_DEF_simple^5%20maintext%20annotation_DEF&sort=sumlevel1%20asc,%20sumlevel2%20asc,%20domdate%20desc,%20score%20desc&facet=true&facet.field=sumlevel1&debugQuery=true > > It took more than 15-20 seconds before browser shows result, and it displays > 4781 milliseconds in QueryComponent
A big discrepancy like this is pretty much always due to disk IO reading the stored fields. To support efficient streaming of large result lists, Solr reads the stored fields and returns as it is streaming back the results to the client. If the operating system doesn't have enough free memory to cache the index, each document will cause a disk seek + read. A lack of memory could be slowing down the query component too (your times look to slow for only 200K docs, even with querying that many fields). Some things you could try: - a 12GB heap is pretty big for 200K documents, unless you facet and sort on a ton of fields. Try reducing the heap size of the JVM to give more memory back to the operating system to cache index files. - reduce the index size if possible - run on a box with more physical memory if possible - keep the index on local disk if possible (you didn't mention if it's on some sort of NFS mount or something) > then I cleaned browser cache and run the same dismax url again, > It still will take 2500milliseonds in QueryComponent, and on the server > machine, I only observed a glance of cpu spike of 84%, and returned to 2% > immediately during the query. Yep, sounds like disk IO. -Yonik http://www.lucidimagination.com