( 2 in 1 reply) 
On Wed, 13 Aug 2008 09:59:21 -0700
Walter Underwood <[EMAIL PROTECTED]> wrote:

> Stripping accents doesn't quite work. The correct translation
> is language-dependent. In German, o-dieresis should turn into
> "oe", but in English, it shoulde be "o" (as in "co__perate" or
> "M__tley Cr__e"). In Swedish, it should not be converted at all.

Hi Walter,
understood. This goes back to the question of language-specific field
definitions / parsers... more on this below.

> 
> There are other character-to-string conversions: ae-ligature
> to "ae", "__" to "ss", and so on. Luckily, those are independent
> of language.
> 
> wunder
> 
> On 8/13/08 9:16 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote:
> 
> > Hi Norberto,
> > 
> > https://issues.apache.org/jira/browse/LUCENE-1343

hi Steve,
thanks for the pointer. this is a Lucene entry... I thought the Latin-filter
was a SOLR feature? I, for one, definitely meant a SOLR filter. 

Given what Walter rightly pointed out about differences in language, I suspect
it would be a SOLR-level thing - fieldType name="textDE" language="DE" would
apply the filter of unicode chars to {ascii?} with the appropriate mapping for
German, etc. 

Or is this that Lucene would / should take care of ?

B
_________________________
{Beto|Norberto|Numard} Meijome

"I've dirtied my hands writing poetry, for the sake of seduction; that is,  for
the sake of a useful cause." Dostoevsky

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Reply via email to