Please confirm is this is caused by grouping. Turn grouping off, what's query time like?
On Aug 27, 2011, at 07:27 , Lord Khan Han wrote: > On the other hand We couldnt use the cache for below types queries. I think > its caused from grouping. Anyway we need to be sub second without cache. > > > > On Sat, Aug 27, 2011 at 2:18 PM, Lord Khan Han <khanuniver...@gmail.com>wrote: > >> Hi, >> >> Thanks for the reply. >> >> Here the solr log capture.: >> >> ****** >> >> hl.fragsize=100&spellcheck=true&spellcheck.q=XXXXX&group.limit=5&hl.simple.pre=<b>&hl.fl=content&spellcheck.collate=true&wt=javabin&hl=true&rows=20&version=2&fl=score,approved,domain,host,id,lang,mimetype,title,tstamp,url,category&hl.snippets=3&start=0&q=%2BXXXX+-"XXXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXXX"+-"XXXXXX"+-XXXX+-"XXXXXX"+-XXX+-"XXXXX"+-XXXX+-XXXX+-"XXXXX"+-"XXXXX"+-"XXXXX"+-XXXX+-"XXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXX"+-"XXXXXX"+-"XXXXXX"+-XXXX+-"XXXXX"+-"XXXXXX"+-XXXX+-"XXXXX"+-"XXXXX"+-XXXXX+-"XXXXX"+-"XXXXX"+-"XXXXX"+-"XXXXX"+-XXXXX+-"XXXXXX"+-"XXXXXX"+-XXXXXX+-XXXXX+-"XXXXX"+"XXXXX"+"XXXXX"+"XXXXXX"++&group.field=host&hl.simple.post=</b>&group=true&qt=search&fq=mrank:[0+TO+100]&fq=word_count:[70+TO+*] >> ****** >> >> XXXX is the words. All phrases "xxxxx" has two words inside. >> >> The timing from the DebugQuery: >> >> <lst name="timing"> >> <double name="time">8654.0</double> >> <lst name="prepare"> >> <double name="time">16.0</double> >> <lst name="org.apache.solr.handler.component.QueryComponent"> >> <double name="time">16.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.FacetComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.HighlightComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.StatsComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.SpellCheckComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.DebugComponent"> >> <double name="time">0.0</double> >> </lst> >> </lst> >> <lst name="process"> >> <double name="time">8638.0</double> >> <lst name="org.apache.solr.handler.component.QueryComponent"> >> <double name="time">4473.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.FacetComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.HighlightComponent"> >> <double name="time">42.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.StatsComponent"> >> <double name="time">0.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.SpellCheckComponent"> >> <double name="time">1.0</double> >> </lst> >> <lst name="org.apache.solr.handler.component.DebugComponent"> >> <double name="time">4122.0</double> >> </lst> >> >> >> The funny thing is if I removed the ShingleFilter from the below "sh_text" >> field and index normally the query time is half of the current shingle one >> !. Shouldn't be shingled index better for such heavy 2 word phrases search >> ? I am confused. >> >> On the other hand One of the on the shelf big FAT companies search engine >> doing the same query same machine 0.7 / 0.8 secs without cache . I am >> confident we can do better in solr but couldnt find the way at the moment. >> >> thanks for helping.. >> >> >> >> >> On Sat, Aug 27, 2011 at 2:46 AM, Erik Hatcher <erik.hatc...@gmail.com>wrote: >> >>> >>> On Aug 26, 2011, at 17:49 , Lord Khan Han wrote: >>>> We are indexing news document from the various sites. Currently we have >>>> 200K docs indexed. Total index size is 36 gig. There is also >>> attachement to >>>> the news (pdf -docs etc) So document size could be high (ie 10mb). >>>> >>>> We are using some complex queries which includes around 30 - 40 terms >>> per >>>> query. %70 of this terms is two word phrases. We are using >>>> with conjunction + and - to pinpoint exact result. >>>> There is also grouping, dismax and boosting , Termvector HL . >>> >>> You're using a lot of componentry there, and have complex queries. We >>> need more details. >>> >>> Turn on debugQuery=true... what do the timings say for each component? >>> >>>> Our problem is query times. Currently its around 6-7 secs. I know our >>> query >>>> is little bit heavy but we want to improve query performance. I believe >>> we >>>> can make it sub second but no succes at the moment. >>> >>> Please provide an example query or two (perhaps a full line logged from >>> Solr itself), and then let's see what debugQuery says about your query being >>> parsed. >>> >>>> We tried to use shingle 2 word token it decreases the query performcen >>> !! We >>>> assumed it will help the speed up phrases search.. >>> >>> Again, we'd need to see a parsed query to understand this deeper. >>> >>> Lots of synonym expansion? A parsed query will tell us. >>> >>> >>> >>>> (using solr latest trunk and HW is pretty good, 32 core with 32 gig >>> ram) >>>> >>>> Here the field def: >>>> >>>> <fieldType name="sh_text" class="solr.TextField" >>> positionIncrementGap="100" >>>> autoGeneratePhraseQueries="true"> >>>> <analyzer type="index"> >>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>> words="stopwords.txt" enablePositionIncrements="true" /> >>>> <filter class="solr.WordDelimiterFilterFactory" >>>> generateWordParts="1" generateNumberParts="1" catenateWords="1" >>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >>>> <!--<filter class="solr.LowerCaseFilterFactory"/>--> >>>> <filter class="solr.KeywordMarkerFilterFactory" >>>> protected="protwords.txt"/> >>>> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" >>>> outputUnigrams="true"/> >>>> </analyzer> >>>> <analyzer type="query"> >>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >>>> ignoreCase="true" expand="true"/> >>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>> words="stopwords.txt" enablePositionIncrements="true" /> >>>> <filter class="solr.WordDelimiterFilterFactory" >>>> generateWordParts="1" generateNumberParts="1" catenateWords="0" >>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >>>> <!--<filter class="solr.LowerCaseFilterFactory"/>--> >>>> <filter class="solr.KeywordMarkerFilterFactory" >>>> protected="protwords.txt"/> >>>> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" >>>> outputUnigrams="true"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> and >>>> >>>> <field name="content" type="sh_text" stored="true" indexed="true" >>>> termVectors="true" termPositions="true" termOffsets="true"/> >>> >>> >>