Dear Alexander, A few questions on stemming support in Solr 3.6.1: > - Can you do non-English stemming? > With solr, many languages are supported, see http://wiki.apache.org/solr/LanguageAnalysis
- We're using solr.PorterStemFilterFactory on the "text_en" field type. We > will index a ton of PDF, DOCX, etc. docs in multiple languages. Is this the > best filter factory to use for stemming? > I think it's hard to answer that question, so may be someone else will have a better answer than mine! My answer to that question would be: the best thing to do is to test the available alternatives and then make a decision. There are different implementations depending on the language. For English, there is the EnglishMinimalStemFilterFactory which does, as it says in the name, minimal stemming. I think that's essentially about plural/singular and some other things. - For words like "run", "runners", "running", "ran", we need all to be > returned. Is there a factory that will return all those? When searching on > "run", Porter returned "run", "running", "runners" but not "ran". Not sure > if anything could pick that up. > If you read the page linked above, down to http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemming, you'll see that you can add custom mapping rules for unsupported cases you need to cover. - Is it possible to turn off the stemming filter via code, so it could be > a checkbox on a web page? We will be writing this in C#. Yes it is. In practice you will not be turning stemming on or off, but you'll have the same content indexed in distinct fields, say : text_unstemmed and text_en, for example ... Where text_unstemmed will not have the stemmer in the analysis pipeline and text_en would have it. Checking the checkbox on the webpage would then simply change the query made to solr so that the stemmed field is queried or not ;-) Practically, you could use dismax queries, and checking the "[x] activate stemming" would make the "qf" parameter be "text_unstemmed^2 text_en" and unchecking the "[ ] activat stemming" would make "qf" parameter be "text_unstemmed" only. You can test all these using a web browser and Solr's HTTP API before digging into the C# client to make sure you get what you expected ;-) Thank you for your help :) I hope this helps :-) Sincerely, > Alex > Best regards, Tanguy