Hi,

I also have this issue with Solr 3.2.0. It is probably this:
https://issues.apache.org/jira/browse/SOLR-2381

Tom

On 06/15/2011 02:09 PM, Mark Cunningham wrote:
Hi,

If you submit information to solr using xml, does the server assume you're
using unicode encoded in utf8? And does it accept the whole range of
possible characters in unicode? (For example, characters that require
multiple bytes when encoded in utf-8).

I'm getting quite a few "Invalid UTF-8 middle byte 0x20 (at char #408, byte
#-1)" errors (with different bytes/characters) that seem to be coming from
characters such as the trademark symbol or registered or some characters
that look like normal characters (such as a dash). It comes out as UTF-8
code units (E2 80 93) using this very handy website
http://rishida.net/tools/conversion/

I tried inserting<?xml version="1.0" encoding="utf-8"?>  at the start of the
xml however this didn't seem to make much difference.

Anyone else have these issues or know what they might be coming from?

Mark



--
Auther of the book "Plone 3 Multimedia" - http://amzn.to/dtrp0C

Tom Gross
email.............@toms-projekte.de
skype.....................tom_gross
web.........http://toms-projekte.de
blog...http://blog.toms-projekte.de

Reply via email to