I am looking for a clear example of using more than one tokenizer for a source single field. My application has a single "body" field which until recently was all latin characters, but we're now encountering both English and Japanese words in a single message. Obviously, we need to be using CJK in addition to WhitespaceTokenizerFactory.
I've found some references to using copyFields or NGrams but I can't quite grasp what the whole solution would look like. -- Jacob Elder @jelder (646) 535-3379