Our SOLR setup (4.0.BETA on Tomcat 6) works as expected when indexing UTF-8 files. Recently, however, we noticed that it has issues with indexing certain text files eg. UTF-16 files. See attachment for an example (tarred+zipped)
tesla-utf16.txt <http://lucene.472066.n3.nabble.com/file/n4010834/tesla-utf16.txt> Looking at the "text" terms, I see 35 terms ie, (1,2,3,...9,0,a,b,c,.....z) !! . A UTF-8 version of this file indexes fine. Here's what the index analyzer looks like Are UTF-16 text files supported? Any thoughts ? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Can-SOLR-Index-UTF-16-Text-tp4010834.html Sent from the Solr - User mailing list archive at Nabble.com.