In a word, no. The CJK languages in general don't necessarily tokenize on whitespace so using a tokenizer that uses whitespace as it's default tokenizer simply won't work.
Have you tried it? It seems a simple test would get you an answer faster. Best, Erick On Wed, Sep 23, 2015 at 7:41 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Hi, > > Would like to check, will StandardTokenizerFactory works well for indexing > both English and Chinese (Bilingual) documents, or do we need tokenizers > that are customised for chinese (Eg: HMMChineseTokenizerFactory)? > > > Regards, > Edwin >