Results so far.
I could locate and isolate the document causing trouble.
I've checked the document with xmllint again. It is valid, well-formed utf8.
I've loaded the single document and get the XML error if displaying the search 
result.
This is through solr admin search and also JSON interface, probably other
interfaces also.
Next step is to use debugger and see what goes wrong.

One thing I can already say is that it is utf8-code "F0 9D 94 90" (U+1D510)
which makes the problem (Mathematical Fraktur Capital M).

Any already known issues about that?

Regards,
Bernd


Am 11.02.2011 08:59, schrieb Bernd Fehling:
> Dear list,
> 
> after loading some documents via DIH which also include urls
> I get this yellow XML error page as search result from solr admin GUI
> after a search.
> It says XML processing error "not well-formed".
> The code it argues about is:
> 
> <arr name="dcurls">
> <str>http://eprints.soton.ac.uk/43350/</str>
> <str>http://dx.doi.org/doi:10.1112/S0024610706023143</str>
> <str>Martinez-Perez, Conchita and Nucinkis, Brita E.A. (2006) Cohomological 
> dimension of Mackey functors for infinite groups. Journal of the
> London Mathematical Society, 74, (2), 379-396. (doi:10.1112/S0024610706023143 
> &lt;http://dx.doi.org/10.1112/S002461070602314\uffff&gt;)</str></arr>
> 
> See the \uffff utf8-code in the last line.
> 
> 1. the loaded data is valid, well-formed and checked with xmllint. No errors.
> 2. there is no \uffff utf8-code in the source data.
> 3. the data is loaded via DIH without any errors.
> 4. if opening the source-view of the result page with firefox there is also 
> no \uffff utf8-code.
> 
> Only idea I have is solr itself or the result page generation.
> 
> How to proceed, what else to check?
> 
> Regards,
> Bernd

Reply via email to