RE: CJK question

2013-05-13 Thread Markus Jelsma
Hi, It uses the StandardAnalyzer which does split on IDEOGRAPHIC SPACE. Cheers, Markus -Original message- > From:Bernd Fehling > Sent: Mon 13-May-2013 13:36 > To: solr-user@lucene.apache.org > Subject: CJK question > > A question about CJK, how will U+3000 be ha

CJK question

2013-05-13 Thread Bernd Fehling
A question about CJK, how will U+3000 be handled? U+3000 belongs to "CJK Symbols and Punctuation" and is named "IDEOGRAPHIC SPACE". Is it wrong if I just map it to U+0020 (SPACE)? What is CJK Analyzer doing with U+3000? If "two CJK words" have U+3000 inside, does it mean these "two CJK words"