Re: Invalid UTF-8 character 0xfffe during shard update

Chris Hostetter Mon, 05 Aug 2013 12:05:50 -0700

: > 0xfffe is not a special character -- it is explicitly *not* a character in
: > Unicode at all, it is set asside as "not a character." specifically so
: > that the character 0xfeff can be used as a BOM, and if the BOM is read
: > incorrectly, it will cause an error.
: 
: XML doesnt allow control character like this, it defines character as:


But is that even relevant?  I thought FFFE was *not* a control character? 
I thought it was completely invaid in Unicode.

I get that the specific error here is from the XML parser -- but my 
question is wether U+FFFE is actaully valid (in which case perhaps there 
is something solr can/should be doing here when serializing/deserializing 
to "escape" (or maybe just strip) the caracter; or is this just completley 
100% not valid in Unicode at all? (which was my understanding, in which 
case i don't get why the DB or JDBC driver or JVM didn't complain before 
Solr ever got it as a Strin)

: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
: [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate
: blocks, FFFE, and FFFF. */


-Hoss

Re: Invalid UTF-8 character 0xfffe during shard update

Reply via email to