Hi, We are indexing news document from the various sites. Currently we have 200K docs indexed. Total index size is 36 gig. There is also attachement to the news (pdf -docs etc) So document size could be high (ie 10mb).
We are using some complex queries which includes around 30 - 40 terms per query. %70 of this terms is two word phrases. We are using with conjunction + and - to pinpoint exact result. There is also grouping, dismax and boosting , Termvector HL . Our problem is query times. Currently its around 6-7 secs. I know our query is little bit heavy but we want to improve query performance. I believe we can make it sub second but no succes at the moment. We tried to use shingle 2 word token it decreases the query performcen !! We assumed it will help the speed up phrases search.. What could be your suggestions ? What we are missing. (using solr latest trunk and HW is pretty good, 32 core with 32 gig ram) Here the field def: <fieldType name="sh_text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <!--<filter class="solr.LowerCaseFilterFactory"/>--> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <!--<filter class="solr.LowerCaseFilterFactory"/>--> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/> </analyzer> </fieldType> and <field name="content" type="sh_text" stored="true" indexed="true" termVectors="true" termPositions="true" termOffsets="true"/>