RE: Indexing Japanese & English

2008-02-07 Thread Paul Clegg
lto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 11:05 AM To: solr-user@lucene.apache.org Subject: RE: Indexing Japanese & English Here are the comments for CJKTokenizer. First, is this what you want? Remember, there are three Japanese writing systems. /** * CJKTokenizer was m

RE: Indexing Japanese & English

2008-02-07 Thread Lance Norskog
Here are the comments for CJKTokenizer. First, is this what you want? Remember, there are three Japanese writing systems. /** * CJKTokenizer was modified from StopTokenizer which does a decent job for * most European languages. It performs other token methods for double-byte * Characters: the