In a word, no. The CJK languages in general don't
necessarily tokenize on whitespace so using a tokenizer
that uses whitespace as it's default tokenizer simply won't
work.

Have you tried it? It seems a simple test would get you
an answer faster.

Best,
Erick

On Wed, Sep 23, 2015 at 7:41 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Hi,
>
> Would like to check, will StandardTokenizerFactory works well for indexing
> both English and Chinese (Bilingual) documents, or do we need tokenizers
> that are customised for chinese (Eg: HMMChineseTokenizerFactory)?
>
>
> Regards,
> Edwin
>

Reply via email to