Passing query params down into analysis chain has been discussed before but I think it is a bit controversial/complex. How about a more high-level approach to be able to change query analyzer, e.g. [f.<field>.]q.analyzer=<analyzer|fieldType> Then query parsers would use the specified analyzer for a field instead of the schema-defined one.
About your Dummy language, it would avoid stemming, but would not avoid false matches against stemmed words that accidentially match the query word. Example: "books" gets stemmed as "books,book". You search for q=book a ticket&lang=dummy, and still get a match on the "books" document. Or is there a way to affect whether a token matches or not based on its payload? A common workaround is be to use a customized stemmer which prefixes all stemmed terms with a special unicode character, so you can totally avoid them if you need to. We discuss the option of deboosting certain token types (stems, synonyms etc) in https://issues.apache.org/jira/browse/LUCENE-3130 but that issue never resulted in anything. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 27. feb. 2015 kl. 22.13 skrev Markus Jelsma <markus.jel...@openindex.io>: > > Hello Robert. Unstemmed terms have slightly higher IDF so they gain more > weight, but stemmed tokens usually have slightly higher TF, so differences > are marginal at best, especially when using standard TFIDFSimilarity. > However, by setting a payload for stemmed terms, you can recognize them at > search time and give them a lower score. You need a custom similarity when > dealing with payloads so it is possible to tune the weight without reindexing. > > MArkus > > > > -----Original message----- >> From:Robert Haschart <rh...@virginia.edu> >> Sent: Friday 27th February 2015 22:01 >> To: solr-user@lucene.apache.org >> Subject: Unstemmed searching >> >> Several months ago Tom-Burton West asked: >> >> The Solr wiki says "A repeated question is "how can I have the >> original term contribute >> more to the score than the stemmed version"? In Solr 4.3, the >> KeywordRepeatFilterFactory has been added to assist this >> functionality. " >> >> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming >> >> (Full section reproduced below.) >> I can see how in the example from the wiki reproduced below that both >> the stemmed and original term get indexed, but I don't see how the >> original term gets more weight than the stemmed term. Wouldn't this >> require a filter that gives terms with the keyword attribute more >> weight? >> >> What am I missing? >> >> Tom >> >> >> I've read the follow-ups to that message, and have used the >> KeywordRepeatFilterFactory in the analyzer chain for both index and >> query as follows: >> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.ICUFoldingFilterFactory" /> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt" enablePositionIncrements="true" /> >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" >> generateNumberParts="1" catenateWords="1" catenateNumbers="1" >> catenateAll="0"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.KeywordRepeatFilterFactory"/> >> <filter class="solr.SnowballPorterFilterFactory"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> >> And although this may be giving some amount of boost to the unstemmed >> form, our users are still asking for the ability to specify that >> stemming is turned off altogether. >> I know that this can be done by copying every field to an unstemmed >> version of that field, but it seems that with the KeywordRepeatFilter >> already in play, that there should be _something_ that can be done to >> disable stemming dynamically at query time without needing to copy all >> the fields and re-index everything. >> >> So that is "X" and possible "Y"'s that might accomplish this that I've >> thought of are: >> >> 1) Allow "Dummy" Snowball filter at query time >> >> * Create org.tartarus.snowball.ext.DummyStemmer which does no stemming >> at all. >> * Add a checkbox to the interface to allow the user to select >> "unstemmed" searching >> * Devise a way for a parameter specified with the query to be passed >> through to the <filter class="solr.SnowballPorterFilterFactory" /> >> as the language to use >> * Use either "English" or "Dummy" to perform either stemmed searching >> or unstemmed searching. >> >> 2) Consult the keyword attribute perhaps in a function query >> >> Any thoughts on either of these ideas, of different approaches to solve >> the problem. >> >> thanks in advance >> >> Robert Haschart >> >>