On Thu, Oct 8, 2009 at 12:48 PM, Claudio Martella <claudio.marte...@tis.bz.it> wrote: > I'm trying to index documents with latin accents (italian documents). I > extract the text from .doc documents with Tika directly into .xml files. > If i open up the XML document with my Dashcode (i run mac os x) i can > see the characters correctly. my xml document is an xml document with the > <?xml version="1.0" encoding="UTF-8"?> > <add><doc> > ... > headers.
Maybe those documents aren't actually in UTF8. Why don't you try Solr's example/exampledocs/utf8-example.xml > When i search and retrieve documents in solr the accented characters are > replaced by an '?'. What is the problem? > I guess the problem could be in (1) the schema (2) the xml document file > coding itself (i don't see the characters correctly if i open it up with > vim in terminal). in vim/gvim try :set encoding=utf8 -Yonik http://www.lucidimagination.com