Looks to me as if your document is not valid UTF-8 and is missing one byte at the end.
Then the '<' of '</str>' is included into the previous character.

Did you create the text snippet yourself? Maybe check if the string functions you are using are multi-byte aware.

Greetings, Marc


On 26-jul-2007, at 16:55, Brian Whitman wrote:
I ended up with this doc in solr:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int><lst name="params"><str name="start">7</ str><str name="fl">content</str><str name="q">"Pez"~10000</str><str name="rows">1</str></lst></lst><result name="response" numFound="5381" start="7"><doc><str name="content">Akatsuki - PE'Z ҳ | ̳ | պ | ŷ | &gt;&gt;&gt; Akatsuki - PE'Z ר | и &amp;nbsp| Ů &amp;nbsp| ֶ &amp;nbsp| պ &amp;nbsp| ¸ &amp;nbsp| tӺ &amp;nbsp| Ϸ &amp;nbsp| Ӱ &amp;nbsp| ϼ &amp;nbsp| ŷ&gt; &amp;nbsp| ϸ &amp;nbsp| ѵ ŷ&gt; &gt; Various Artists[2005] &gt; Now Jazz 3 - That's What I Call Jazz &gt; Akatsuki - PE'Z Akatsuki - PE'Z ר Now Jazz 3 - That's What I Call Jazz ݳ֣ Various Artists[2005] Akatsuki - PE'Z ȱ ǻᾡ첹ȱĸʣ ҵ˸ø Ӹø&gt;&gt;&gt; һ񈐼/str></doc></result>
</response>


Note the missing < in </str>

Solrj throws this (on a larger query that includes this doc):
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3,20624] Message: The element type "str" must be terminated by the matching end-tag "</str>".

And firefox can't render it either, throws an error.

So any query that returns this doc will cause an error.

Obviously there's some weird stuff in this doc, but is it a solr issue that the < got destroyed?



Reply via email to