Hi,

We are indexing news  document from the various sites. Currently we have
200K docs indexed. Total index size is 36 gig.  There is also attachement to
the news (pdf -docs etc) So document size could be high (ie 10mb).

We are using some complex queries which includes around 30 - 40 terms per
query. %70 of this terms is two word phrases. We are using
with conjunction +  and -  to pinpoint exact result.
There is also grouping, dismax and boosting , Termvector HL  .

Our problem is query times. Currently its around 6-7 secs. I know our query
is little bit heavy but we want to improve query performance. I believe we
can make it sub second but no succes at the moment.

We tried to use shingle 2 word token it decreases the query performcen !! We
assumed it will help the speed up phrases search..  What could be
your suggestions ? What we are missing.

(using solr latest trunk and HW is pretty good, 32 core  with 32 gig ram)

Here the field def:

<fieldType name="sh_text" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <!--<filter class="solr.LowerCaseFilterFactory"/>-->
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
outputUnigrams="true"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <!--<filter class="solr.LowerCaseFilterFactory"/>-->
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
outputUnigrams="true"/>
      </analyzer>
    </fieldType>

and

 <field name="content" type="sh_text" stored="true" indexed="true"
termVectors="true" termPositions="true" termOffsets="true"/>

Reply via email to