Hi,

the SnowballPorterFilterFactory is a complete stemmer that transforms words to their basic form (laufen -> lauf, läufer -> lauf). One part of that process is replacing language specific special characters.

So SnowballPorterFilterFactory does what you wanted (beside other things). I mentioned it because it is a very good start when using solr and especially when dealing with documents in languages other than english.

Tom

Matthias Eireiner schrieb:
Dear list,

it has been some time, but here is what I did.
I had a look at Thomas Traeger's tip to use the
SnowballPorterFilterFactory, which does not actually do the job.
Its purpose is to convert regular ASCII into special characters.
And I want it the other way, such that all special character are
converted to regular ASCII.
The tip of J.J. Larrea, to use the PatternReplaceFilterFactory, solved
the problem. And as Chris Hostetter noted, stored fields always return the initial
value, which turned the second part of my question obsolete.

Thanks a lot for your help!

best Matthias



-----Ursprüngliche Nachricht-----
Von: Thomas Traeger [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 26. September 2007 23:44
An: solr-user@lucene.apache.org
Betreff: Re: Converting German special characters / umlaute


Try the SnowballPorterFilterFactory described here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

You should use the German2 variant that converts ä and ae to a, ö and oe

to o and so on. More details:
http://snowball.tartarus.org/algorithms/german2/stemmer.html

Every document in solr can have any number of fields which might have the same source but have different field types and are therefore handled

differently (stored as is, analyzed in different ways...). Use copyField

in your schema.xml to feed your data into multiple fields. During searching you decide which fields you like to search on (usually the analyzed ones) and which you retrieve when getting the document back.

Tom

Matthias Eireiner schrieb:
Dear list,

I have two questions regarding German special characters or umlaute.

is there an analyzer which automatically converts all german special characters to their specific dissected from, such as ü to ue and ä to ae, etc.?!

I also would like to have, that the search is always run against the dissected data. But when the results are returned the initial data with the non modified data should be returned.

Does lucene GermanAnalyzer this job? I run across it, but I could not figure out from the documentation whether it does the job or not.

thanks a lot in advance.

Matthias



Reply via email to