Hi, I specify the following encoding when POSTING the data to Solr:
text/xml; charset=utf-8 The encoding of the actual XML is also UTF-8. I see that the update handler fails even if the character is NOT right next to XML closing tag. If the character is anywhere in any of the XML tags, the update handler fails to parse the XML. Thanks, Av ----- Original Message ---- From: Yonik Seeley <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, May 9, 2007 10:45:43 AM Subject: Re: Solr Update Handler Failes with Some Doc Characters On 5/9/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > I run the example using Jetty on Windows 2003 machine. When I submit some > documents containing upper ASCII characters, Solr update handler fails with > an XML parsing error saying that it encountered an EOF before the closing > tags. Normally if there is a charset mixup, you will just get weird looking results. I suppose that if a char that is greater than 128 is used, and Solr is treating as UTF-8, then the following char would be treated as part of a single multibyte character. Hence if the char is directly followed by XML markup, part of that XML markup will be lost (hence the parse exception). In short, this is probably a char encoding issue. What character encoding are you using when posting to Solr, and is it declared in the HTTP header? -Yonik ____________________________________________________________________________________ Bored stiff? Loosen up... Download and play hundreds of games for free on Yahoo! Games. http://games.yahoo.com/games/front