On 11/13/2010 at 2:04 PM, Yonik Seeley wrote:
n Sat, Nov 13, 2010 at 1:50 PM, Steven A Rowe <sar...@syr.edu> wrote:
> > Looks to me like the returned value is in a Solr-internal form of XML
> > character escaping: \u0000 is represented as "#0;" and \u0008 is
> > represented as "#8;".  (The escaping code is in
> > solr/src/java/org/apache/common/util/XML.java.)
> 
> Yep, there is no legal way to represent some unicode code points in XML.

Right - the real fix here (as you pointed out on #lucene) is to not use XML 
transports.

> > You can get the value back in its original binary form by unescaping the
> > /#[0-9]+;/ format.  Here is a test illustrating this fix that I added to
> > SolrExampleTests, then ran from SolrExampleEmbeddedTest:
> 
> The problem here is that one might then unescape what was meant to be
> a literal "#8;"
> One could come up with a full escaping mechanism over XML I suppose...
> but I'm not sure it would be worth it.

s/illustrating this fix/exposing this dirty hack/ :)

Steve

Reply via email to