+1

That's exactly what we need, too.

On Mon, Nov 29, 2010 at 5:28 PM, Shawn Heisey <elyog...@elyograg.org> wrote:

> On 11/29/2010 3:15 PM, Jacob Elder wrote:
>
>> I am looking for a clear example of using more than one tokenizer for a
>> source single field. My application has a single "body" field which until
>> recently was all latin characters, but we're now encountering both English
>> and Japanese words in a single message. Obviously, we need to be using CJK
>> in addition to WhitespaceTokenizerFactory.
>>
>
> What I'd like to see is a CJK filter that runs after tokenization
> (whitespace in my case) and doesn't do anything but handle the CJK
> characters.  If there are no CJK characters in the token, it should do
> nothing at all.  The CJK tokenizer does a whole host of other things that I
> want to handle myself.
>
> Shawn
>
>


-- 
Jacob Elder
@jelder
(646) 535-3379

Reply via email to