Leonardo Santagada wrote:

On 28/02/2008, at 00:23, Christian Wittern wrote:

The documents I am trying to index with Solr contain characters from the CJK Extension B, which had been added to Unicode in version 3.1 (March 2001).


Just to give more information, does java suport this? I beleive they don't support characters with more than 2 bytes so maybe this is the case...
It is supported since in Java since 5 (or 1.5.0):

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.UnicodeBlock.html#CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B

Christian

--
Christian Wittern Institute for Research in Humanities, Kyoto University
47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

Reply via email to