Don't feel bad: character encoding problems are often said to be among
the hardest in software engineering.
There's no simple answer to problems like this since as Erick said, any
tool in your chain could be the culprit. I doubt anyone on this list
will be able to guess "the answer" since the
I tried a lot of things and almost am at my wit's end :(
Here is the code I used to get the strings -
String htmlContent = readPage(page.getWebURL().getURL());
I even tried -
Document doc = Jsoup.parse(new URL(url).openStream(), "UTF-8", url);
String htmlContent = doc.html();
& Documen
It sounds like the characters were mishandled at index build time.
I would use Luke to see if a character that appear correctly
when you change the output to be SHIFT JIS is actually
stored as one Unicode. I bet it's stored as two characters,
each having the character of the value that happened
to
Sorry, was away a bit & hence the delay.
I am inserting java strings into a java bean class, and then doing a
addBean() method to insert the POJO into Solr.
When i Query using either tomcat/jetty, I get these special characters. But
I have noted, if I change output to - "Shift-JIS" encoding then
The problem is there are about a dozen places where the character
encoding can be mis-configured. The problem you're seeing above
actually looks like a problem with the character set configured in
your browser, it may have nothing to do with what's actually in Solr.
You might write small SolrJ pro
How are you extracting the text that is there in the website[1] you are
referring to? Apache Nutch or any other crawler? If yes, initially check
whether that crawler engine is giving you data in correct format before you
invoke solr index method.
[1]http://blog.diigo.com/2009/09/28/scheduled-group
Hi Rajani,
I followed the steps exactly as in
http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/
However, when i send a query to this new instance in tomcat, i again get
the error -
Scheduled Groups Maintenance
In preparation for the new release roll-
Hi,
If you are using Apache Tomcat Server, hope you are not missing the
below mentioned configuration:
I had faced similar issue with Chinese Characters and had resolved with the
above config.
Links for reference :
http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apa
Hi,
How do you post your data to solr? If it's by posting XML, then it
should be properly encoded in UTF-8 (which is the XML default).
Regardless of what's in the DB (which can be a mystery with MySQL).
At query time, if the XML writer is used, then it's encoded in UTF-8.
If the json one is used
Hi Peter,
I have the same set of issues and will look for a response here.
Sometimes those other chars can be create at the time of input (like
extraction from a Microsoft Office doc from third part tool for
example). But MySQL looking OK in the browser might be because the
encoding of MyS
10 matches
Mail list logo