On Sat, Nov 13, 2010 at 1:50 PM, Steven A Rowe <sar...@syr.edu> wrote: > Looks to me like the returned value is in a Solr-internal form of XML > character escaping: \u0000 is represented as "#0;" and \u0008 is represented > as "#8;". (The escaping code is in > solr/src/java/org/apache/common/util/XML.java.)
Yep, there is no legal way to represent some unicode code points in XML. > You can get the value back in its original binary form by unescaping the > /#[0-9]+;/ format. Here is a test illustrating this fix that I added to > SolrExampleTests, then ran from SolrExampleEmbeddedTest: The problem here is that one might then unescape what was meant to be a literal "#8;" One could come up with a full escaping mechanism over XML I suppose... but I'm not sure it would be worth it. -Yonik http://www.lucidimagination.com