Looks to me as if your document is not valid UTF-8 and is missing one
byte at the end.
Then the '<' of '</str>' is included into the previous character.
Did you create the text snippet yourself? Maybe check if the string
functions you are using are multi-byte aware.
Greetings, Marc
On 26-jul-2007, at 16:55, Brian Whitman wrote:
I ended up with this doc in solr:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">1</int><lst name="params"><str name="start">7</
str><str name="fl">content</str><str name="q">"Pez"~10000</str><str
name="rows">1</str></lst></lst><result name="response"
numFound="5381" start="7"><doc><str name="content">Akatsuki -
PE'Z ҳ | ̳ | պ | ŷ | >>> Akatsuki - PE'Z ר | и
&nbsp| Ů &nbsp| ֶ &nbsp| պ &nbsp| ¸ &nbsp|
tӺ &nbsp| Ϸ &nbsp| Ӱ &nbsp| ϼ &nbsp| ŷ>
&nbsp| ϸ &nbsp| ѵ ŷ> > Various Artists[2005] >
Now Jazz 3 - That's What I Call Jazz > Akatsuki - PE'Z Akatsuki
- PE'Z ר Now Jazz 3 - That's What I Call Jazz ݳ֣ Various
Artists[2005] Akatsuki - PE'Z ȱ ǻᾡ첹ȱĸʣ ҵ˸ø
Ӹø>>> һ/str></doc></result>
</response>
Note the missing < in </str>
Solrj throws this (on a larger query that includes this doc):
Caused by: javax.xml.stream.XMLStreamException: ParseError at
[row,col]:[3,20624]
Message: The element type "str" must be terminated by the matching
end-tag "</str>".
And firefox can't render it either, throws an error.
So any query that returns this doc will cause an error.
Obviously there's some weird stuff in this doc, but is it a solr
issue that the < got destroyed?