( 2 in 1 reply) On Wed, 13 Aug 2008 09:59:21 -0700 Walter Underwood <[EMAIL PROTECTED]> wrote:
> Stripping accents doesn't quite work. The correct translation > is language-dependent. In German, o-dieresis should turn into > "oe", but in English, it shoulde be "o" (as in "co__perate" or > "M__tley Cr__e"). In Swedish, it should not be converted at all. Hi Walter, understood. This goes back to the question of language-specific field definitions / parsers... more on this below. > > There are other character-to-string conversions: ae-ligature > to "ae", "__" to "ss", and so on. Luckily, those are independent > of language. > > wunder > > On 8/13/08 9:16 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote: > > > Hi Norberto, > > > > https://issues.apache.org/jira/browse/LUCENE-1343 hi Steve, thanks for the pointer. this is a Lucene entry... I thought the Latin-filter was a SOLR feature? I, for one, definitely meant a SOLR filter. Given what Walter rightly pointed out about differences in language, I suspect it would be a SOLR-level thing - fieldType name="textDE" language="DE" would apply the filter of unicode chars to {ascii?} with the appropriate mapping for German, etc. Or is this that Lucene would / should take care of ? B _________________________ {Beto|Norberto|Numard} Meijome "I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause." Dostoevsky I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.