Hi,
It uses the StandardAnalyzer which does split on IDEOGRAPHIC SPACE.
Cheers,
Markus
-Original message-
> From:Bernd Fehling
> Sent: Mon 13-May-2013 13:36
> To: solr-user@lucene.apache.org
> Subject: CJK question
>
> A question about CJK, how will U+3000 be ha
A question about CJK, how will U+3000 be handled?
U+3000 belongs to "CJK Symbols and Punctuation" and is named "IDEOGRAPHIC
SPACE".
Is it wrong if I just map it to U+0020 (SPACE)?
What is CJK Analyzer doing with U+3000?
If "two CJK words" have U+3000 inside, does it mean these "two CJK words"