Hi, I also have this issue with Solr 3.2.0. It is probably this: https://issues.apache.org/jira/browse/SOLR-2381
Tom On 06/15/2011 02:09 PM, Mark Cunningham wrote:
Hi, If you submit information to solr using xml, does the server assume you're using unicode encoded in utf8? And does it accept the whole range of possible characters in unicode? (For example, characters that require multiple bytes when encoded in utf-8). I'm getting quite a few "Invalid UTF-8 middle byte 0x20 (at char #408, byte #-1)" errors (with different bytes/characters) that seem to be coming from characters such as the trademark symbol or registered or some characters that look like normal characters (such as a dash). It comes out as UTF-8 code units (E2 80 93) using this very handy website http://rishida.net/tools/conversion/ I tried inserting<?xml version="1.0" encoding="utf-8"?> at the start of the xml however this didn't seem to make much difference. Anyone else have these issues or know what they might be coming from? Mark
-- Auther of the book "Plone 3 Multimedia" - http://amzn.to/dtrp0C Tom Gross email.............@toms-projekte.de skype.....................tom_gross web.........http://toms-projekte.de blog...http://blog.toms-projekte.de