Re: Shingle and Query Performance

Erik Hatcher Fri, 26 Aug 2011 16:47:29 -0700

On Aug 26, 2011, at 17:49 , Lord Khan Han wrote:
> We are indexing news  document from the various sites. Currently we have
> 200K docs indexed. Total index size is 36 gig.  There is also attachement to
> the news (pdf -docs etc) So document size could be high (ie 10mb).
> 
> We are using some complex queries which includes around 30 - 40 terms per
> query. %70 of this terms is two word phrases. We are using
> with conjunction +  and -  to pinpoint exact result.
> There is also grouping, dismax and boosting , Termvector HL  .


You're using a lot of componentry there, and have complex queries.  We need 
more details.

Turn on debugQuery=true... what do the timings say for each component?  

> Our problem is query times. Currently its around 6-7 secs. I know our query
> is little bit heavy but we want to improve query performance. I believe we
> can make it sub second but no succes at the moment.

Please provide an example query or two (perhaps a full line logged from Solr 
itself), and then let's see what debugQuery says about your query being parsed.

> We tried to use shingle 2 word token it decreases the query performcen !! We
> assumed it will help the speed up phrases search..

Again, we'd need to see a parsed query to understand this deeper.  

Lots of synonym expansion?  A parsed query will tell us.



> (using solr latest trunk and HW is pretty good, 32 core  with 32 gig ram)
> 
> Here the field def:
> 
> <fieldType name="sh_text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>        <!--<filter class="solr.LowerCaseFilterFactory"/>-->
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>        <!--<filter class="solr.LowerCaseFilterFactory"/>-->
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true"/>
>      </analyzer>
>    </fieldType>
> 
> and
> 
> <field name="content" type="sh_text" stored="true" indexed="true"
> termVectors="true" termPositions="true" termOffsets="true"/>

Re: Shingle and Query Performance

Reply via email to