Re: character encoding issue...

2013-11-10 Thread Michael Sokolov
Don't feel bad: character encoding problems are often said to be among the hardest in software engineering. There's no simple answer to problems like this since as Erick said, any tool in your chain could be the culprit. I doubt anyone on this list will be able to guess "the answer" since the

Re: character encoding issue...

2013-11-09 Thread Chris
I tried a lot of things and almost am at my wit's end :( Here is the code I used to get the strings - String htmlContent = readPage(page.getWebURL().getURL()); I even tried - Document doc = Jsoup.parse(new URL(url).openStream(), "UTF-8", url); String htmlContent = doc.html(); & Documen

Re: character encoding issue...

2013-11-05 Thread T. Kuro Kurosaka
It sounds like the characters were mishandled at index build time. I would use Luke to see if a character that appear correctly when you change the output to be SHIFT JIS is actually stored as one Unicode. I bet it's stored as two characters, each having the character of the value that happened to

Re: character encoding issue...

2013-11-04 Thread Chris
Sorry, was away a bit & hence the delay. I am inserting java strings into a java bean class, and then doing a addBean() method to insert the POJO into Solr. When i Query using either tomcat/jetty, I get these special characters. But I have noted, if I change output to - "Shift-JIS" encoding then

Re: character encoding issue...

2013-11-04 Thread Erick Erickson
The problem is there are about a dozen places where the character encoding can be mis-configured. The problem you're seeing above actually looks like a problem with the character set configured in your browser, it may have nothing to do with what's actually in Solr. You might write small SolrJ pro

Re: character encoding issue...

2013-11-03 Thread Rajani Maski
How are you extracting the text that is there in the website[1] you are referring to? Apache Nutch or any other crawler? If yes, initially check whether that crawler engine is giving you data in correct format before you invoke solr index method. [1]http://blog.diigo.com/2009/09/28/scheduled-group

Re: character encoding issue...

2013-10-31 Thread Chris
Hi Rajani, I followed the steps exactly as in http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/ However, when i send a query to this new instance in tomcat, i again get the error - Scheduled Groups Maintenance In preparation for the new release roll-

Re: character encoding issue...

2013-10-29 Thread Rajani Maski
Hi, If you are using Apache Tomcat Server, hope you are not missing the below mentioned configuration: I had faced similar issue with Chinese Characters and had resolved with the above config. Links for reference : http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apa

Re: character encoding issue

2009-11-04 Thread Jérôme Etévé
Hi, How do you post your data to solr? If it's by posting XML, then it should be properly encoded in UTF-8 (which is the XML default). Regardless of what's in the DB (which can be a mystery with MySQL). At query time, if the XML writer is used, then it's encoded in UTF-8. If the json one is used

Re: character encoding issue

2009-11-04 Thread Jonathan Hendler
Hi Peter, I have the same set of issues and will look for a response here. Sometimes those other chars can be create at the time of input (like extraction from a Microsoft Office doc from third part tool for example). But MySQL looking OK in the browser might be because the encoding of MyS