2013/8/6 Raymond Wiker <rwi...@gmail.com>

> Ok, let me rephrase that slightly: does your database extraction include
> BLOBs or CLOBs that are actually complete documents, that might be UTF-8
> encoded text?
>
> It definitely does, each entry I have in PostgreSQL has a field of type
"text" that include UTF-8 encoded text. It gets imported in Solr "content"
field.


> From the stack trace in your second post, it seems that the error occurs
> while parsing an XML file uploaded via the UpdateRequestHandler. I'm
> guessing (please note) that Solr is using an XML representation of your
> documents (records) for communication between replicas; it could be that
> the code that constructs the XML request does not check for the BOM
> "character".
>

Yonik Seeley confirmed the issue it's related to XML parser here
https://issues.apache.org/jira/browse/SOLR-5101 (I filed the issue on Jira
before starting this thread).

Thanks,
Federico



>
>
> On Mon, Aug 5, 2013 at 11:10 PM, Federico Chiacchiaretta <
> federico.c...@gmail.com> wrote:
>
> > No, the content has no XML tags included (hope I understood what you were
> > asking here).
> >
> > Federico
> >
> >
> > 2013/8/5 Raymond Wiker <rwi...@gmail.com>
> >
> > > On Aug 5, 2013, at 20:12 , Federico Chiacchiaretta <
> > > federico.c...@gmail.com> wrote:
> > > > Hi Raymond,
> > > > I agree with you, 0xfffe is a special character, that is why I was
> > asking
> > > > how it's handled in solr.
> > > > In my document, 0xfffe does not appear at the beginning, it's in the
> > > > content.
> > > >
> > > > Just an update about testing I'm doing: in a SolrCloud two shards
> > > > environment, if I launch dataimport on one node of the shard that
> will
> > be
> > > > target for that doc, all the docs got written properly; if I launch
> > > > dataimport on one node of the other shard and then it forwards to the
> > > > target, I get the error.
> > >
> > > Does your content include entire XML documents? It could be that the
> > > process of packaging the content could create a structure that includes
> > an
> > > entire document (with BOM) somewhere inside a compound document (just
> > > guessing here.)
> >
>

Reply via email to