I am looking for a clear example of using more than one tokenizer for a
source single field. My application has a single "body" field which until
recently was all latin characters, but we're now encountering both English
and Japanese words in a single message. Obviously, we need to be using CJK
in addition to WhitespaceTokenizerFactory.

I've found some references to using copyFields or NGrams but I can't quite
grasp what the whole solution would look like.

-- 
Jacob Elder
@jelder
(646) 535-3379

Reply via email to