Once in a while we get this

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4,790470]
[14:32:21.877] Message: An invalid XML character (Unicode: 0x6) was found in the element content of the document. [14:32:21.877] at com .sun .org .apache .xerces .internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:588) [14:32:21.877] at org .apache .solr .handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java: 318) [14:32:21.877] at org .apache .solr .handler .XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195)
...

Our data comes from all sorts of places and although we've tried to be utf8 wherever we can, there are still cracks.

I would much rather a document get added with replacement character than to have this error prevent the addition of 8K documents (as has happened here, this one character was in a 8K <add><doc>..<doc... run, and only the ones before this character were added.)

Is there something I can do on the solr side to ignore/replace invalid characters?





Reply via email to