2013/8/6 Raymond Wiker <rwi...@gmail.com> > Ok, let me rephrase that slightly: does your database extraction include > BLOBs or CLOBs that are actually complete documents, that might be UTF-8 > encoded text? > > It definitely does, each entry I have in PostgreSQL has a field of type "text" that include UTF-8 encoded text. It gets imported in Solr "content" field.
> From the stack trace in your second post, it seems that the error occurs > while parsing an XML file uploaded via the UpdateRequestHandler. I'm > guessing (please note) that Solr is using an XML representation of your > documents (records) for communication between replicas; it could be that > the code that constructs the XML request does not check for the BOM > "character". > Yonik Seeley confirmed the issue it's related to XML parser here https://issues.apache.org/jira/browse/SOLR-5101 (I filed the issue on Jira before starting this thread). Thanks, Federico > > > On Mon, Aug 5, 2013 at 11:10 PM, Federico Chiacchiaretta < > federico.c...@gmail.com> wrote: > > > No, the content has no XML tags included (hope I understood what you were > > asking here). > > > > Federico > > > > > > 2013/8/5 Raymond Wiker <rwi...@gmail.com> > > > > > On Aug 5, 2013, at 20:12 , Federico Chiacchiaretta < > > > federico.c...@gmail.com> wrote: > > > > Hi Raymond, > > > > I agree with you, 0xfffe is a special character, that is why I was > > asking > > > > how it's handled in solr. > > > > In my document, 0xfffe does not appear at the beginning, it's in the > > > > content. > > > > > > > > Just an update about testing I'm doing: in a SolrCloud two shards > > > > environment, if I launch dataimport on one node of the shard that > will > > be > > > > target for that doc, all the docs got written properly; if I launch > > > > dataimport on one node of the other shard and then it forwards to the > > > > target, I get the error. > > > > > > Does your content include entire XML documents? It could be that the > > > process of packaging the content could create a structure that includes > > an > > > entire document (with BOM) somewhere inside a compound document (just > > > guessing here.) > > >