On Thu, Oct 8, 2009 at 12:48 PM, Claudio Martella
<claudio.marte...@tis.bz.it> wrote:
> I'm trying to index documents with latin accents (italian documents). I
> extract the text from .doc documents with Tika directly into .xml files.
> If i open up the XML document with my Dashcode (i run mac os x) i can
> see the characters correctly. my xml document is an xml document with the
> <?xml version="1.0" encoding="UTF-8"?>
> <add><doc>
> ...
> headers.

Maybe those documents aren't actually in UTF8.
Why don't you try Solr's example/exampledocs/utf8-example.xml

> When i search and retrieve documents in solr the accented characters are
> replaced by an '?'. What is the problem?
> I guess the problem could be in (1) the schema (2) the xml document file
> coding itself (i don't see the characters correctly if i open it up with
> vim in terminal).

in vim/gvim try
:set encoding=utf8

-Yonik
http://www.lucidimagination.com

Reply via email to