Re: Encoding problem with ExtractRequestHandler for HTML indexing

2010-03-24 Thread Teruhiko Kurosaka
I suppose you mean Extract_ing_RequestHandler. Out of curiosity, I sent in a Japanese HTML file of EUC-JP encoding, and it converted to Unicode properly and the index has correct Japanese words. Does your HTML files have META tag for Content-type with the value having charset= ? For example, this

RE: encoding problem

2009-09-01 Thread Bernadette Houghton
09 9:18 AM To: 'solr-user@lucene.apache.org' Subject: RE: encoding problem Still having a few issues with encoding, although I've been able to resolve the particular issue below by just re-editing the affected record. The other encoding issue is with Greek characters. With sol

RE: encoding problem

2009-08-30 Thread Bernadette Houghton
hough...@deakin.edu.au] Sent: Friday, 28 August 2009 9:31 AM To: 'solr-user@lucene.apache.org'; 'yo...@lucidimagination.com' Subject: RE: encoding problem Shalin, the XML from solr admin for the relevant field is displaying as - Moncrieff, Joan, Macauley, Peter and Epps, Janine 20

RE: encoding problem

2009-08-27 Thread Bernadette Houghton
Shalin, the XML from solr admin for the relevant field is displaying as - Moncrieff, Joan, Macauley, Peter and Epps, Janine 2006, “My Universe is Here�: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers, vol. 38, no. 2, pp. 71-83. The wei

Re: encoding problem

2009-08-27 Thread Yonik Seeley
Message- > From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] > Sent: Wednesday, 26 August 2009 5:50 PM > To: solr-user@lucene.apache.org > Subject: Re: encoding problem > > On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton < > bernadette.hough...@deakin.edu.au> wrote:

RE: encoding problem

2009-08-27 Thread Bernadette Houghton
om: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:50 PM To: solr-user@lucene.apache.org Subject: Re: encoding problem On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton < bernadette.hough...@deakin.edu.au> wrote: > Thanks for your quick reply, Sh

RE: encoding problem

2009-08-26 Thread Fuad Efendi
If you are complaining about Web Application (other than SOLR) (probably behind-the Apache HTTPD) having encoding problem - try to troubleshoot it with Mozilla Firefox + Live Http Headers plugin. Look at "Content-Encoding" HTTP response headers, and don't forget about tag inside HTML... -Fuad

Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton < bernadette.hough...@deakin.edu.au> wrote: > Thanks for your quick reply, Shalin. > > Tomcat is running on my Windows machine, but does not appear in Windows > Services (as I was expecting it should ... am I wrong?). I'm running it from > a st

RE: encoding problem

2009-08-26 Thread Bernadette Houghton
Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat? SOLR is part of the rep

Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 12:42 PM, Bernadette Houghton < bernadette.hough...@deakin.edu.au> wrote: > Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I > access the JVM??? > When you execute the java executable, just add -Dfile.encoding=UTF-8 as a command line argument to the ex

RE: encoding problem

2009-08-26 Thread Bernadette Houghton
Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access the JVM??? Regards Bern -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:10 PM To: solr-user@lucene.apache.org Subject: Re: encoding proble

Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton < bernadette.hough...@deakin.edu.au> wrote: > We have an encoding problem with our solr application. That is, non-ASCII > chars displaying fine in SOLR, but in googledegook in our application . > > Our tomcat server.xml file already contains UR

Re: Encoding problem

2009-04-01 Thread Rui Pereira
Thanks,I detected that same problem. I have CP 1252 system file encoding and was recording data-config.xml file in UTF-8. DIH was reading using the default encoding. One possible workarround was using InputStream and OutputStream like DIH, but the files won't be in UTF-8 if the system has different

Re: Encoding problem

2009-03-27 Thread Shalin Shekhar Mangar
On Sat, Mar 28, 2009 at 12:51 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > > I see that you are specifying the topologyname's value in the query itself. > It might be a bug in DataImportHandler because it reads the data-config as a > string from an InputStream. If your default plat

Re: Encoding problem

2009-03-27 Thread Shalin Shekhar Mangar
On Fri, Mar 27, 2009 at 8:41 PM, Rui Pereira wrote: > I'm having problems with encoding in responses from search queries. The > encoding problem only occurs in the topologyname field, if a instancename > has accents it is returned correctly. In all my configurations I have > UTF-8. > > > > >

Re: Encoding problem

2009-03-27 Thread aerox7
Hi, I had the same problem with DATAIMPORTHandler : i have a utf-8 mysql DATABASE but it's seems that DIH import data in LATIN... So i just use Transformer to (re)encode my strings in UTF-8. Rui Pereira-2 wrote: > > I'm having problems with encoding in responses from search queries. The > encod