Hi Markus,
the result of my investigation is that Lucene currently can only handle
UTF-8 code within BMP [Basic Multilingual Plane] (plane 0) <= 0x.
Any code above BMP might end in unpredictable results which is bad.
If you get invalid UTF-8 from the index and use wt=xml it gives the error
pa
No i haven't located the issue. It might be Solr but it could also be Xerces
having trouble with it. You can possibly work around the problem by using the
JSONResponseWriter.
On Friday 11 February 2011 15:45:23 Bernd Fehling wrote:
> Hi Markus,
>
> yes it looks like the same issue. There is als
Hi Markus,
yes it looks like the same issue. There is also a \u utf8-code in your dump.
Till now I followed it into XMLResponseWriter.
Some steps before the result in a buffer looks good and the utf8-code is
correct.
Really hard to debug this freaky problem.
Have you looked deeper into this
It looks like you hit the same issue as i did a while ago:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg46510.html
On Friday 11 February 2011 08:59:27 Bernd Fehling wrote:
> Dear list,
>
> after loading some documents via DIH which also include urls
> I get this yellow XML error pa
Results so far.
I could locate and isolate the document causing trouble.
I've checked the document with xmllint again. It is valid, well-formed utf8.
I've loaded the single document and get the XML error if displaying the search
result.
This is through solr admin search and also JSON interface, p