Hi Erick,

thank you for the reply.
Yes, I'm using the fast vector highlighter (Solr 4.3). Every request should
only deliver 10 results.

Here is my schema configuration on both field:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.WordDelimiterFilterFactory"
catenateWords="1" catenateNumbers="1" catenateAll="1"
                        preserveOriginal="1" />
                <filter class="solr.ASCIIFoldingFilterFactory" />
        </analyzer>
        <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.ASCIIFoldingFilterFactory" />
        </analyzer>
        <analyzer type="multiterm">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.ASCIIFoldingFilterFactory" />
        </analyzer>
</fieldType>
<fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
        <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.SnowballPorterFilterFactory"
language="German2" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.StopFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
                <filter class="solr.ShingleFilterFactory" />
        </analyzer>
        <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.SnowballPorterFilterFactory"
language="German2" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.StandardFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
        </analyzer>
        <analyzer type="multiterm">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.ASCIIFoldingFilterFactory" />
        </analyzer>
</fieldType> 
<field name="spell" type="textSpell" indexed="true" multiValued="true" />
<field name="content" type="text" stored="true" indexed="true"
multiValued="true" termVectors="true" termPositions="true"
termOffsets="true" />

Field content contains in average around 5000 - 6000 words (only rough
estimation).

Best regards
Erwin




-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, February 25, 2014 3:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance problem on Solr query on stemmed values

Right, highlighting may have to re-analyze the input in order to return the
highlighted data. This will be significantly slower than the search,
especially if you have a large number of rows you're returning.

You can get better performance in highlighting by using
FastVectorHighlighter. See:

https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter

1000x is unusual, though, unless your fields are very large or you're
returning a lot of documents.

Best,
Erick


On Tue, Feb 25, 2014 at 5:23 AM, Erwin Gunadi <festiva.s...@gmail.com>wrote:

> Hi,
>
>
>
> I would like to know whether anyone have experienced this kind of 
> phenomena.
>
>
>
> We are having performance problem regarding query on stemmed value.
>
> I've documented the symptoms which I'm currently facing:
>
>
>
>
> Search on field content
>
> Search on field spell
>
> Highlighting (on content field)
>
> Processing speed
>
>
> active
>
> active
>
> Active
>
> Slow
>
>
> active
>
> not active
>
> Active
>
> Fast
>
>
> active
>
> active
>
> not active
>
> Fast
>
>
> not active
>
> active
>
> Active
>
> Slow
>
>
> not active
>
> active
>
> not active
>
> Fast
>
>
>
> *Fast means 1000x faster than "slow".
>
>
>
> Field Content is our index field, which holds original text, and spell 
> is the field with stemmed value.
>
> According to my measurement result, search on both fields (stemmed and 
> not
> stemmed) is really fast.
>
> But when I start to take highlighting into our query it takes too long 
> to process.
>
>
>
> Best Regards
>
> Erwin
>
>

Reply via email to