I'm not 100% on this, but I imagine this is what happens:

(using -> to mean "tokenized to")

Suppose that you index:

"I am running home" -> "am run running home"

If you then query "running home" -> "run running home" and thus give a higher 
score than if you query "runs home" -> "run runs home"


----- Original Message -----
> The Solr wiki says   "A repeated question is "how can I have the
> original term contribute
> more to the score than the stemmed version"? In Solr 4.3, the
> KeywordRepeatFilterFactory has been added to assist this
> functionality. "
> 
> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
> 
> (Full section reproduced below.)
> I can see how in the example from the wiki reproduced below that both
> the stemmed and original term get indexed, but I don't see how the
> original term gets more weight than the stemmed term.  Wouldn't this
> require a filter that gives terms with the keyword attribute more
> weight?
> 
> What am I missing?
> 
> Tom
> 
> 
> 
> ---------------------------------------------
> "A repeated question is "how can I have the original term contribute
> more to the score than the stemmed version"? In Solr 4.3, the
> KeywordRepeatFilterFactory has been added to assist this
> functionality. This filter emits two tokens for each input token, one
> of them is marked with the Keyword attribute. Stemmers that respect
> keyword attributes will pass through the token so marked without
> change. So the effect of this filter would be to index both the
> original word and the stemmed version. The 4 stemmers listed above all
> respect the keyword attribute.
> 
> For terms that are not changed by stemming, this will result in
> duplicate, identical tokens in the document. This can be alleviated by
> adding the RemoveDuplicatesTokenFilterFactory.
> 
> <fieldType name="text_keyword" class="solr.TextField"
> positionIncrementGap="100">
>  <analyzer>
>    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>    <filter class="solr.KeywordRepeatFilterFactory"/>
>    <filter class="solr.PorterStemFilterFactory"/>
>    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>  </analyzer>
> </fieldType>"
> 

-- 
Diego Fernandez - 爱国
Software Engineer
GSS - Diagnostics

Reply via email to