invalid XML character

Brian Whitman Sat, 01 Mar 2008 13:23:11 -0800

Once in a while we get this

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4,790470]

[14:32:21.877] Message: An invalid XML character (Unicode: 0x6) wasfound in the element content of the document.[14:32:21.877] atcom.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:588)[14:32:21.877] atorg.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:318)[14:32:21.877] atorg.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195)

...

Our data comes from all sorts of places and although we've tried to beutf8 wherever we can, there are still cracks.

I would much rather a document get added with replacement characterthan to have this error prevent the addition of 8K documents (as hashappened here, this one character was in a 8K <add><doc>..<doc... run,and only the ones before this character were added.)

Is there something I can do on the solr side to ignore/replace invalidcharacters?

invalid XML character

Reply via email to