Dear Alexander,

A few questions on stemming support in Solr 3.6.1:
>  - Can you do non-English stemming?
>
With solr, many languages are supported, see
http://wiki.apache.org/solr/LanguageAnalysis

 - We're using solr.PorterStemFilterFactory on the "text_en" field type. We
> will index a ton of PDF, DOCX, etc. docs in multiple languages. Is this the
> best filter factory to use for stemming?
>
I think it's hard to answer that question, so may be someone else will have
a better answer than mine!
My answer to that question would be: the best thing to do is to test the
available alternatives and then make a decision.
There are different implementations depending on the language. For English,
there is the EnglishMinimalStemFilterFactory which does, as it says in the
name, minimal stemming. I think that's essentially about plural/singular
and some other things.

 - For words like "run", "runners", "running", "ran", we need all to be
> returned. Is there a factory that will return all those? When searching on
> "run", Porter returned "run", "running", "runners" but not "ran". Not sure
> if anything could pick that up.
>
If you read the page linked above, down to
http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemming, you'll
see that you can add custom mapping rules for unsupported cases you need to
cover.

 - Is it possible to turn off the stemming filter via code, so it could be
> a checkbox on a web page? We will be writing this in C#.

Yes it is. In practice you will not be turning stemming on or off, but
you'll have the same content indexed in distinct fields, say :
text_unstemmed and text_en, for example ... Where text_unstemmed will not
have the stemmer in the analysis pipeline and text_en would have it.

Checking the checkbox on the webpage would then simply change the query
made to solr so that the stemmed field is queried or not ;-)

Practically, you could use dismax queries, and checking the "[x] activate
stemming" would make the "qf" parameter be "text_unstemmed^2 text_en" and
unchecking the "[ ] activat stemming" would make "qf" parameter be
"text_unstemmed" only.

You can test all these using  a web browser and Solr's HTTP API before
digging into the C# client to make sure you get what you expected ;-)

Thank you for your help :)

I hope this helps :-)

Sincerely,
> Alex
>

Best regards,
Tanguy

Reply via email to