Our SOLR setup  (4.0.BETA on Tomcat 6) works as expected when indexing UTF-8
files. Recently, however, we noticed that it has issues with indexing
certain text files eg. UTF-16 files.  See attachment for an example
(tarred+zipped)

tesla-utf16.txt
<http://lucene.472066.n3.nabble.com/file/n4010834/tesla-utf16.txt>  

Looking at the "text" terms, I see 35 terms ie, (1,2,3,...9,0,a,b,c,.....z)
!! . A UTF-8 version of this file indexes fine.

Here's what the index analyzer looks like


Are UTF-16 text files supported? Any thoughts ?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-SOLR-Index-UTF-16-Text-tp4010834.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to