Leonardo Santagada wrote:
On 28/02/2008, at 00:23, Christian Wittern wrote:
The documents I am trying to index with Solr contain characters from
the CJK
Extension B, which had been added to Unicode in version 3.1 (March
2001).
Just to give more information, does java suport this? I beleive they
don't support characters with more than 2 bytes so maybe this is the
case...
It is supported since in Java since 5 (or 1.5.0):
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.UnicodeBlock.html#CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
Christian
--
Christian Wittern
Institute for Research in Humanities, Kyoto University
47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN