Chris Hostetter wrote:
> : is there an analyzer which automatically converts all german special
> : characters to their specific dissected from, such as ü to ue and ä to
> : ae, etc.?!
> 
> See also the ISOLatin1TokenFilter which does this regardless of langauge.

Actually, ISOLatin1TokenFilter does NOT convert /ü/ to /ue/, /ä/ to
/ae/, etc.

Instead, it converts /ü/ to /u/, /ä/ to /a/, etc.  It *does* convert /ß/
to /ss/, though I've seen some people write that the correct
substitution for /ß/ in German is /sz/ - I don't speak or read German,
so I don't know.

Maybe there should be an option on ISOLatin1TokenFilter to use German
substitutions, in addition to the current behavior of simply stripping
diacritics?

Does anyone know if there are other (Latin-1-utilizing) languages
besides German with standardized diacritic substitutions that involve
something other than just stripping the diacritics?

Steve

Reply via email to