Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

2010-07-06 Thread Jan Høydahl / Cominvent
The Char-filters MUST come before the Tokenizer, due to their nature of processing the character-stream and not the tokens. If you need to apply the accent normalizatino later in the analysis chain, either use ISOLatin1AccentFilterFactory or help with the implementation of SOLR-1978. -- Jan Hø

Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

2010-07-05 Thread Koji Sekiguchi
No, all tokenizer can be used with mappingcharfilter Koji Sekiguchi from mobile On 2010/07/06, at 0:32, Saïd Radhouani wrote: > Thanks Koji for the reply and for updating wiki. As it's written now in wiki, > it sounds (at least to me) like MappingCharFilterFactory works only with > Whitespac

Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

2010-07-05 Thread Saïd Radhouani
Thanks Koji for the reply and for updating wiki. As it's written now in wiki, it sounds (at least to me) like MappingCharFilterFactory works only with WhitespaceTokenizerFactory. Did you really mean that? Because this filter works also with other tkenizers. For instance, in my text type, I'm u

Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

2010-07-05 Thread Koji Sekiguchi
In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory must be used with MappingCharFilterFactory. But, when I use these tokenizer and filter together, I get a sever error saying that the filed type containing these filter and tokenizer is unknown. However, when I use this