Hi, I use Solr 1.1 application for indexing russian documents. Sometimes
I've got as search results docs with invalid character.
For example I've indexed "иго" but search returned "и��о". It's strange
because something has changed 2 bytes into 6 bytes.
иго - D0 B8 D0 B3 D0 BE
и��о - D0 B8 EF BF BD EF BF BD D0 BE
This field is indexed as string verbatim.
<fieldtype name="string" class="solr.StrField" sortMissingLast="true"
omitNorms="true"/>
After reindexing documents with invalid character are fixed.
Has anybody idea where is the problem?
Maciek