Re: Solr 3.1 indexing error Invalid UTF-8 character 0xffff

Mike Sokolov Mon, 27 Jun 2011 06:26:15 -0700

OK - re-reading your message it seems maybe that is what you were tryingto say too, Robert. FWIW I agree with you that XML is rigid, sometimesfor purely arbitrary reasons. But nobody has really helped Markus here- unfortunately, there is no easy way out of this mess. What I do tohandle issues like this is to wrap the stream I'm handing to the parserin some kind of cleanup stream that handles a few yucky issues. Youcould, eg, just strip out invalid XML characters. Maybe Nutch should bedoing this, or at least handling the error better?


-Mike


On 06/27/2011 09:19 AM, Mike Sokolov wrote:

Actually - you are both wrong!
It is true that 0xffff is a valid UTF8 character, and not a valid UTF8byte sequence.
But the parser is reporting (or trying to) that 0xffff is an invalidXML character.
And Robert - if the wording offends you, you might want to send a noteto Tatu (http://jira.codehaus.org/) suggesting that he alter thewording of the error message :)
-Mike

On 06/27/2011 09:01 AM, Bernd Fehling wrote:
Am 27.06.2011 14:48, schrieb Robert Muir:
On Mon, Jun 27, 2011 at 8:47 AM, Bernd Fehling
<bernd.fehl...@uni-bielefeld.de>  wrote:
correct!!!
but what i said, is totally different than what you said.

you are still wrong.
http://www.unicode.org/faq//utf_bom.html

see Q: What is a UTF?

Re: Solr 3.1 indexing error Invalid UTF-8 character 0xffff

Reply via email to