RE: Can SOLR Index UTF-16 Text

2012-10-03 Thread Fuad Efendi
eaders when you POST your file to Solr) -Fuad Efendi http://www.tokenizer.ca -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: October-03-12 1:30 PM To: solr-user@lucene.apache.org Subject: RE: Can SOLR Index UTF-16 Text Something is missing from the body of your E

RE: Can SOLR Index UTF-16 Text

2012-10-03 Thread Fuad Efendi
re and etc... -Fuad Efendi http://www.tokenizer.ca -Original Message- From: vybe3142 [mailto:vybe3...@gmail.com] Sent: October-03-12 12:30 PM To: solr-user@lucene.apache.org Subject: Re: Can SOLR Index UTF-16 Text Thanks for all the responses. Problem partially solved (see below) 1.

Re: Can SOLR Index UTF-16 Text

2012-10-03 Thread vybe3142
Thanks for all the responses. Problem partially solved (see below) 1. In a sense, my question is theoretical since the input to out SOLR server is (currently) UTF-8 files produced by a third party text extraction utility (not Tika). On the server side, we read and index the text via a custom data

RE: Can SOLR Index UTF-16 Text

2012-10-02 Thread Fuad Efendi
Solr can index bytearrays too: unigram, bigram, trigram... even bitsets, tritsets, qatrisets ;- ) LOL I got strong cold... BTW, don't forget to configure UTF-8 as your default (Java) container encoding... -Fuad

Re: Can SOLR Index UTF-16 Text

2012-10-02 Thread Lance Norskog
arset mime-type: I think it is "text/plain; charset=utf-16". - Original Message - | From: "Chris Hostetter" | To: solr-user@lucene.apache.org | Sent: Friday, September 28, 2012 5:17:15 PM | Subject: Re: Can SOLR Index UTF-16 Text | | | : Our SOLR setup (4.0.BETA on To

Re: Can SOLR Index UTF-16 Text

2012-09-28 Thread Chris Hostetter
: Our SOLR setup (4.0.BETA on Tomcat 6) works as expected when indexing UTF-8 : files. Recently, however, we noticed that it has issues with indexing : certain text files eg. UTF-16 files. See attachment for an example : (tarred+zipped) : : tesla-utf16.txt :

Re: Can SOLR Index UTF-16 Text

2012-09-28 Thread Shawn Heisey
On 9/27/2012 2:55 PM, vybe3142 wrote: Our SOLR setup (4.0.BETA on Tomcat 6) works as expected when indexing UTF-8 files. Recently, however, we noticed that it has issues with indexing certain text files eg. UTF-16 files. I'd wait for a yes/no vote on this from one of the actual experts on thi