AW: Stemmer German2

André Widhani Wed, 07 Nov 2012 01:48:00 -0800

Do you use the LowerCaseFilterFactory filter in your analysis chain? You will 
probably want to add it and if you aready have, make sure it is _before_ the 
stemming filter so you get consistent results regardless of lower- or uppercase 
spelling.


You can protect words from being subject to stemming by adding a 
KeyWordMarkerFilterFactory filter before the stemmer, protected words are in a 
text file. This should be placed after the lower case filter so you can use 
lower csase terms in the file.

Some stemmer classes like SnowballPorterFilterFactory also allow you to pass a 
"protected" attribute (again pointing to a file).

All of this is on the Solr wiki (AnalyzersTokenizersTokenFilters, 
LanguageAnalysis) if you need more details.

Regards,
André

________________________________________
Von: Andreas Niekler [aniek...@informatik.uni-leipzig.de]
Gesendet: Mittwoch, 7. November 2012 10:02
An: solr-user@lucene.apache.org
Betreff: Stemmer German2

Dear List,

i have an unwanted behavior with the German2 Stemmer. For example the
river Elbe:

If i input elbe - the word gets reduced to elb
If i input Elbe - everything is ok and elbe is stored to the index.

If i now query for elbe or Elbe i get of course differnt Results
allowing the users not either use Elbe or elbe to get the same results.

Can i insert an exception list to the Stemmer. Otherwise we will have a
very hard time explaining some users why this is happaning for some words.

Thank you

Andreas

--
Andreas Niekler, Dipl. Ing. (FH)
NLP Group | Department of Computer Science
University of Leipzig
Johannisgasse 26 | 04103 Leipzig

mail: aniek...@informatik.uni-leipzig.deg.de

AW: Stemmer German2

Reply via email to