: Our SOLR setup (4.0.BETA on Tomcat 6) works as expected when indexing UTF-8 : files. Recently, however, we noticed that it has issues with indexing : certain text files eg. UTF-16 files. See attachment for an example : (tarred+zipped) : : tesla-utf16.txt : <http://lucene.472066.n3.nabble.com/file/n4010834/tesla-utf16.txt>
No attachment came through to the list, and the URL nabble seems to have provided when you posted your message leads to a 404. IN general, the question of "is indexing a UTF-16 file supported" largely depneds on *how* you are indexing this file -- if it's plain text, are you parsing it yourself using some client code, and then sending it to solr, are you using DIH to read it from disk? are you using ExtractingRequestHandler? those are all very differnet ways to index data in Solr -- and depending on what you are doing determins how/where the encoding of that file is processed. -Hoss