On 11/13/2010 at 2:04 PM, Yonik Seeley wrote: n Sat, Nov 13, 2010 at 1:50 PM, Steven A Rowe <sar...@syr.edu> wrote: > > Looks to me like the returned value is in a Solr-internal form of XML > > character escaping: \u0000 is represented as "#0;" and \u0008 is > > represented as "#8;". (The escaping code is in > > solr/src/java/org/apache/common/util/XML.java.) > > Yep, there is no legal way to represent some unicode code points in XML.
Right - the real fix here (as you pointed out on #lucene) is to not use XML transports. > > You can get the value back in its original binary form by unescaping the > > /#[0-9]+;/ format. Here is a test illustrating this fix that I added to > > SolrExampleTests, then ran from SolrExampleEmbeddedTest: > > The problem here is that one might then unescape what was meant to be > a literal "#8;" > One could come up with a full escaping mechanism over XML I suppose... > but I'm not sure it would be worth it. s/illustrating this fix/exposing this dirty hack/ :) Steve