: I agree with you, 0xfffe is a special character, that is why I was asking : how it's handled in solr. : In my document, 0xfffe does not appear at the beginning, it's in the : content.
Unless i'm missunderstanding something (and it's very likely that i am)... 0xfffe is not a special character -- it is explicitly *not* a character in Unicode at all, it is set asside as "not a character." specifically so that the character 0xfeff can be used as a BOM, and if the BOM is read incorrectly, it will cause an error. if you genuinely have have 0xfffe "in the content" of your database, then your database content (by definition) can not be UTF-8, because 0xfffe is not a character in Unicode. if you are able to index that content in a single node Sold+DIH+JDBC+postgress setup, then you are getting (un)lucky -- postgres isn't complaing that you have invalid Unicode, the JDBC driver is doing something to finangle that character into the java String, and at that point DIH & Solr don't care about it -- they are just happy to deal with it as a string. but at the point you start using SolrCloud, Solr has to send that String to another node, and at tthat point the serializing/deserializing code seem to be catching the fact that your content is not valid UTF-8. We could concievably make Solr notice & complain about this earlier -- but we can't do anythng to make it valid Unicode. But like i said: i might be missunderstanding something. : : Just an update about testing I'm doing: in a SolrCloud two shards : environment, if I launch dataimport on one node of the shard that will be : target for that doc, all the docs got written properly; if I launch : dataimport on one node of the other shard and then it forwards to the : target, I get the error. : : Thanks : Federico : : : 2013/8/5 Raymond Wiker <rwi...@gmail.com> : : > I think #xfffe is special; it is used as a "byte order mark" to identify : > the encoding used. In that case, it should only appear at the beginning of : > the document. : > : > Sent from my iPhone : > : > On 5 Aug 2013, at 17:19, Federico Chiacchiaretta <federico.c...@gmail.com> : > wrote: : > : > > Hi Shawn, : > > thanks for your answer. : > > From the docs you linked i found: : > > "This property is only relevent for server versions less than or equal to : > > 7.2". : > > : > > I'm using version 9.1, I gave it a try but unfortunately I had no luck. : > > Besides, I checked encoding settings on DB and it's UTF-8. : > > : > > Please note that import of data works with a single instance of Solr, but : > > it doesn't on a SolrCloud when the update gets forwarded to another node. : > > Thinking about jetty bug (or misconfiguration), I also tried a test : > > environment based on tomcat, but I have the same result. : > > : > > How utf character 0xfffe is supposed to be handled? It seems that solr : > can : > > handle it well, while sending it over HTTP to another node breaks things. : > > Can it be a HttpSolrServer bug? : > > : > > Thanks, : > > Federico : > > : > > : > > : > > : > > 2013/8/5 Shawn Heisey <s...@elyograg.org> : > > : > >> On 8/1/2013 7:20 AM, Federico Chiacchiaretta wrote: : > >>> on data import from a PostgreSQL db, I get the following error in : > >> solr.log: : > >>> : > >>> ERROR - 2013-08-01 09:51:00.217; org.apache.solr.common.SolrException; : > >>> shard update error RetryNode: : > >> : > http://172.16.201.173:8983/solr/archive/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException : > >> : : > >>> Invalid : > >>> UTF-8 character 0xfffe at char #416, byte #127) : > >> : > >> It sounds like your database is not using the UTF-8 character set, but : > >> the JDBC driver (or the driver-server combination) is not aware that the : > >> character set is different. Solr expects UTF-8. : > >> : > >> Generally what you want to do is tell the JDBC driver to use the UTF-8 : > >> character set, which will hopefully cause either the driver or the DB : > >> server to translate for you. : > >> : > >> There is a charSet parameter for the postgresql jdbc driver: : > >> : > >> http://jdbc.postgresql.org/documentation/80/connect.html : > >> : > >> These are added to the jdbc URL after a ? character, just like : > >> parameters on an http URL. : > >> : > >> Thanks, : > >> Shawn : > >> : > >> : > : -Hoss