RE: problem with quering solr after indexing UTF-8 encoded CSV files

2007-08-20 Thread Lance Norskog
: Ben Shlomo, Yatir [mailto:[EMAIL PROTECTED] Sent: Monday, August 20, 2007 8:40 AM To: solr-user@lucene.apache.org Subject: problem with quering solr after indexing UTF-8 encoded CSV files Hi! I have utf-8 encoded data inside a csv file (actually it’s a tab separated file - attached) I can

problem with quering solr after indexing UTF-8 encoded CSV files

2007-08-20 Thread Ben Shlomo, Yatir
Hi! I have utf-8 encoded data inside a csv file (actually it’s a tab separated file - attached) I can index it with no apparent errors I did not forget to set this in my tomcat configuration When I query a document using the UTF-8 text I get zero matches: -

Re: Indexing UTF-8

2006-08-10 Thread Tricia Williams
I no longer remember when or where this came up, but when using Tomcat there is a known character encoding problem when you expect utf-8. In Tomcat's $TOMCAT_HOME/conf/server.xml on the port you're running Solr on ensure URIEncoding="UTF-8" is in This has solved some of my encoding problems.

Re: Indexing UTF-8

2006-08-10 Thread Andrew May
Bertrand Delacretaz wrote: Does your build contain the http://issues.apache.org/jira/browse/SOLR-38 patch, and if so did you try posting the utf8-example.xml document with post.sh and querying it through the admin interface? That patch should be part of the build I'm using (patch committed on t

Re: Indexing UTF-8

2006-08-10 Thread Bertrand Delacretaz
On 8/10/06, Andrew May <[EMAIL PROTECTED]> wrote: ...I'm using the 28th July nightly build, which I believe contains all the recent fixes... Does your build contain the http://issues.apache.org/jira/browse/SOLR-38 patch, and if so did you try posting the utf8-example.xml document with post.sh

Indexing UTF-8

2006-08-10 Thread Andrew May
Hi, I'm trying to index some UTF-8 data, but I'm experiencing some problems. I'm using the 28th July nightly build, which I believe contains all the recent fixes for making the administration webapp use UTF-8. I've tried running in both the provided Jetty instance and Tomcat 5.5.17. I've ind