Sorry, was away a bit & hence the delay. I am inserting java strings into a java bean class, and then doing a addBean() method to insert the POJO into Solr.
When i Query using either tomcat/jetty, I get these special characters. But I have noted, if I change output to - "Shift-JIS" encoding then those characters appear as some japanese characters I think. But then this solution doesn't work for all special characters as I can still see some of them...isn't there an encoding that can cover all the characters whatever they might be? Any ideas on what do i do? Regards, Chris On Mon, Nov 4, 2013 at 6:27 PM, Erick Erickson <erickerick...@gmail.com>wrote: > The problem is there are about a dozen places where the character > encoding can be mis-configured. The problem you're seeing above > actually looks like a problem with the character set configured in > your browser, it may have nothing to do with what's actually in Solr. > > You might write small SolrJ program and see if you can dump the contents > in binary and examine to see... > > Best > Erick > > > On Sun, Nov 3, 2013 at 6:39 AM, Rajani Maski <rajinima...@gmail.com> > wrote: > > > How are you extracting the text that is there in the website[1] you are > > referring to? Apache Nutch or any other crawler? If yes, initially check > > whether that crawler engine is giving you data in correct format before > you > > invoke solr index method. > > > > [1]http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/ > > > > URI encoding should resolve this problem. > > > > > > > > > > On Fri, Nov 1, 2013 at 10:50 AM, Chris <christu...@gmail.com> wrote: > > > > > Hi Rajani, > > > > > > I followed the steps exactly as in > > > > > > > > > http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/ > > > > > > However, when i send a query to this new instance in tomcat, i again > get > > > the error - > > > > > > <str name="fulltxt">Scheduled Groups Maintenance > > > In preparation for the new release roll-out,���� Diigo groups won’t be > > > accessible on Sept 28 (Mon) around midnight 0:00 PST for several > > > hours. > > > Stay tuned to say hello to Diigo V4 soon! > > > > > > location of the text - > > > http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/ > > > > > > same problem at - http://cn.nytimes.com/business/20130926/c26alibaba/ > > > > > > All text in title comes like - > > > > > > ������������������������������������ - ��������������������� > > > ������������</str> > > > <arr name="text"> > > > <str>������������������������������������ - > > > ��������������������� ������������</str> > > > </arr> > > > > > > > > > Can you please advice? > > > > > > Chris > > > > > > > > > > > > > > > On Tue, Oct 29, 2013 at 11:33 PM, Rajani Maski <rajinima...@gmail.com > > > >wrote: > > > > > > > Hi, > > > > > > > > If you are using Apache Tomcat Server, hope you are not missing > the > > > > below mentioned configuration: > > > > > > > > <Connector port=”port Number″ protocol=”HTTP/1.1″ > > > > connectionTimeout=”20000″ > > > > redirectPort=”8443″ *URIEncoding=”UTF-8″*/> > > > > > > > > I had faced similar issue with Chinese Characters and had resolved > with > > > the > > > > above config. > > > > > > > > Links for reference : > > > > > > > > > > > > > > http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/ > > > > > > > > > > > > > > http://blog.sidu.in/2007/05/tomcat-and-utf-8-encoded-uri-parameters.html#.Um_3P3Cw2X8 > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > On Tue, Oct 29, 2013 at 9:20 PM, Chris <christu...@gmail.com> wrote: > > > > > > > > > Hi All, > > > > > > > > > > I get characters like - > > > > > > > > > > ������������������ - CTA������������ - > > > > > > > > > > in the solr index. I am adding Java beans to solr by the addBean() > > > > > function. > > > > > > > > > > This seems to be a character encoding issue. Any pointers on how to > > > > > resolve this one? > > > > > > > > > > I have seen that this occurs mostly for japanese chinese > characters. > > > > > > > > > > > > > > >