: > 0xfffe is not a special character -- it is explicitly *not* a character in : > Unicode at all, it is set asside as "not a character." specifically so : > that the character 0xfeff can be used as a BOM, and if the BOM is read : > incorrectly, it will cause an error. : : XML doesnt allow control character like this, it defines character as:
But is that even relevant? I thought FFFE was *not* a control character? I thought it was completely invaid in Unicode. I get that the specific error here is from the XML parser -- but my question is wether U+FFFE is actaully valid (in which case perhaps there is something solr can/should be doing here when serializing/deserializing to "escape" (or maybe just strip) the caracter; or is this just completley 100% not valid in Unicode at all? (which was my understanding, in which case i don't get why the DB or JDBC driver or JVM didn't complain before Solr ever got it as a Strin) : Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | : [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate : blocks, FFFE, and FFFF. */ -Hoss