Several months ago Tom-Burton West asked:

   The Solr wiki says   "A repeated question is "how can I have the
   original term contribute
   more to the score than the stemmed version"? In Solr 4.3, the
   KeywordRepeatFilterFactory has been added to assist this
   functionality. "

   https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming

   (Full section reproduced below.)
   I can see how in the example from the wiki reproduced below that both
   the stemmed and original term get indexed, but I don't see how the
   original term gets more weight than the stemmed term.  Wouldn't this
   require a filter that gives terms with the keyword attribute more
   weight?

   What am I missing?

   Tom


I've read the follow-ups to that message, and have used the KeywordRepeatFilterFactory in the analyzer chain for both index and query as follows:

<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ICUFoldingFilterFactory"  />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordRepeatFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

And although this may be giving some amount of boost to the unstemmed form, our users are still asking for the ability to specify that stemming is turned off altogether. I know that this can be done by copying every field to an unstemmed version of that field, but it seems that with the KeywordRepeatFilter already in play, that there should be _something_ that can be done to disable stemming dynamically at query time without needing to copy all the fields and re-index everything.

So that is "X" and possible "Y"'s that might accomplish this that I've thought of are:

1) Allow "Dummy" Snowball filter at query time

 * Create org.tartarus.snowball.ext.DummyStemmer which does no stemming
   at all.
 * Add a checkbox to the interface to allow the user to select
   "unstemmed" searching
 * Devise a way for a parameter specified with the query to be passed
   through to the <filter class="solr.SnowballPorterFilterFactory" />
   as the language to use
 * Use either "English" or "Dummy" to perform either stemmed searching
   or unstemmed searching.

2) Consult the keyword attribute perhaps in a function query

Any thoughts on either of these ideas, of different approaches to solve the problem.

thanks in advance

Robert Haschart

Reply via email to