Hi,
the SnowballPorterFilterFactory is a complete stemmer that transforms
words to their basic form (laufen -> lauf, läufer -> lauf). One part of
that process is replacing language specific special characters.
So SnowballPorterFilterFactory does what you wanted (beside other
things). I mentioned it because it is a very good start when using solr
and especially when dealing with documents in languages other than english.
Tom
Matthias Eireiner schrieb:
Dear list,
it has been some time, but here is what I did.
I had a look at Thomas Traeger's tip to use the
SnowballPorterFilterFactory, which does not actually do the job.
Its purpose is to convert regular ASCII into special characters.
And I want it the other way, such that all special character are
converted to regular ASCII.
The tip of J.J. Larrea, to use the PatternReplaceFilterFactory, solved
the problem.
And as Chris Hostetter noted, stored fields always return the initial
value, which turned the second part of my question obsolete.
Thanks a lot for your help!
best
Matthias
-----Ursprüngliche Nachricht-----
Von: Thomas Traeger [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 26. September 2007 23:44
An: solr-user@lucene.apache.org
Betreff: Re: Converting German special characters / umlaute
Try the SnowballPorterFilterFactory described here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
You should use the German2 variant that converts ä and ae to a, ö and oe
to o and so on. More details:
http://snowball.tartarus.org/algorithms/german2/stemmer.html
Every document in solr can have any number of fields which might have
the same source but have different field types and are therefore handled
differently (stored as is, analyzed in different ways...). Use copyField
in your schema.xml to feed your data into multiple fields. During
searching you decide which fields you like to search on (usually the
analyzed ones) and which you retrieve when getting the document back.
Tom
Matthias Eireiner schrieb:
Dear list,
I have two questions regarding German special characters or umlaute.
is there an analyzer which automatically converts all german special
characters to their specific dissected from, such as ü to ue and ä to
ae, etc.?!
I also would like to have, that the search is always run against the
dissected data. But when the results are returned the initial data
with the non modified data should be returned.
Does lucene GermanAnalyzer this job? I run across it, but I could not
figure out from the documentation whether it does the job or not.
thanks a lot in advance.
Matthias