On Dec 4, 2007 5:02 AM, Maciej Szczytowski
<[EMAIL PROTECTED]> wrote:
> Hi, I use Solr 1.1 application for indexing russian documents. Sometimes
> I've got as search results docs with invalid character.
>
> For example I've indexed "иго" but search returned "и��о". It's strange
> because something has changed 2 bytes into 6 bytes.
>
> иго - D0 B8 D0 B3 D0 BE
>
> и��о - D0 B8 EF BF BD EF BF BD D0 BE
>
> This field is indexed as string verbatim.
>
> <fieldtype name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true"/>
>
> After reindexing documents with invalid character are fixed.
>
> Has anybody idea where is the problem?

Probably an issue with the charset not being set correctly (or the
character encoding not matching the charset declaration) when it was
first indexed.

-Yonik

Reply via email to