Hi, I use Solr 1.1 application for indexing russian documents. Sometimes I've got as search results docs with invalid character.

For example I've indexed "иго" but search returned "и��о". It's strange because something has changed 2 bytes into 6 bytes.

иго - D0 B8 D0 B3 D0 BE

и��о - D0 B8 EF BF BD EF BF BD D0 BE

This field is indexed as string verbatim.

<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

After reindexing documents with invalid character are fixed.

Has anybody idea where is the problem?

Maciek

Reply via email to