On Aug 26, 2011, at 17:49 , Lord Khan Han wrote: > We are indexing news document from the various sites. Currently we have > 200K docs indexed. Total index size is 36 gig. There is also attachement to > the news (pdf -docs etc) So document size could be high (ie 10mb). > > We are using some complex queries which includes around 30 - 40 terms per > query. %70 of this terms is two word phrases. We are using > with conjunction + and - to pinpoint exact result. > There is also grouping, dismax and boosting , Termvector HL .
You're using a lot of componentry there, and have complex queries. We need more details. Turn on debugQuery=true... what do the timings say for each component? > Our problem is query times. Currently its around 6-7 secs. I know our query > is little bit heavy but we want to improve query performance. I believe we > can make it sub second but no succes at the moment. Please provide an example query or two (perhaps a full line logged from Solr itself), and then let's see what debugQuery says about your query being parsed. > We tried to use shingle 2 word token it decreases the query performcen !! We > assumed it will help the speed up phrases search.. Again, we'd need to see a parsed query to understand this deeper. Lots of synonym expansion? A parsed query will tell us. > (using solr latest trunk and HW is pretty good, 32 core with 32 gig ram) > > Here the field def: > > <fieldType name="sh_text" class="solr.TextField" positionIncrementGap="100" > autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true" /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <!--<filter class="solr.LowerCaseFilterFactory"/>--> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > outputUnigrams="true"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true" /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > <!--<filter class="solr.LowerCaseFilterFactory"/>--> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > outputUnigrams="true"/> > </analyzer> > </fieldType> > > and > > <field name="content" type="sh_text" stored="true" indexed="true" > termVectors="true" termPositions="true" termOffsets="true"/>