Sorry, was away a bit & hence the delay.

I am inserting java strings into a java bean class, and then doing a
addBean() method to insert the POJO into Solr.

When i Query using either tomcat/jetty, I get these special characters. But
I have noted, if I change output to - "Shift-JIS" encoding then those
characters appear as some japanese characters I think.

But then this solution doesn't work for all special characters as I can
still see some of them...isn't there an encoding that can cover all the
characters whatever they might be? Any ideas on what do i do?

Regards,
Chris


On Mon, Nov 4, 2013 at 6:27 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> The problem is there are about a dozen places where the character
> encoding can be mis-configured. The problem you're seeing above
> actually looks like a problem with the character set configured in
> your browser, it may have nothing to do with what's actually in Solr.
>
> You might write small SolrJ program and see if you can dump the contents
> in binary and examine to see...
>
> Best
> Erick
>
>
> On Sun, Nov 3, 2013 at 6:39 AM, Rajani Maski <rajinima...@gmail.com>
> wrote:
>
> > How are you extracting the text that is there in the website[1] you are
> > referring to? Apache Nutch or any other crawler? If yes, initially check
> > whether that crawler engine is giving you data in correct format before
> you
> > invoke solr index method.
> >
> > [1]http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/
> >
> > URI encoding should resolve this problem.
> >
> >
> >
> >
> > On Fri, Nov 1, 2013 at 10:50 AM, Chris <christu...@gmail.com> wrote:
> >
> > > Hi Rajani,
> > >
> > > I followed the steps exactly as in
> > >
> > >
> >
> http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/
> > >
> > > However, when i send a query to this new instance in tomcat, i again
> get
> > > the error -
> > >
> > >   <str name="fulltxt">Scheduled Groups Maintenance
> > > In preparation for the new release roll-out,���� Diigo groups won’t be
> > > accessible on Sept 28 (Mon) around midnight 0:00 PST for several
> > > hours.
> > > Stay tuned to say hello to Diigo V4 soon!
> > >
> > > location of the text  -
> > > http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/
> > >
> > > same problem at - http://cn.nytimes.com/business/20130926/c26alibaba/
> > >
> > > All text in title comes like -
> > >
> > > ������������������������������������ - ���������������������
> > > ������������</str>
> > >     <arr name="text">
> > >       <str>������������������������������������ -
> > > ��������������������� ������������</str>
> > >     </arr>
> > >
> > >
> > > Can you please advice?
> > >
> > > Chris
> > >
> > >
> > >
> > >
> > > On Tue, Oct 29, 2013 at 11:33 PM, Rajani Maski <rajinima...@gmail.com
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > >    If you are using Apache Tomcat Server, hope you are not missing
> the
> > > > below mentioned configuration:
> > > >
> > > >  <Connector port=”port Number″ protocol=”HTTP/1.1″
> > > > connectionTimeout=”20000″
> > > > redirectPort=”8443″ *URIEncoding=”UTF-8″*/>
> > > >
> > > > I had faced similar issue with Chinese Characters and had resolved
> with
> > > the
> > > > above config.
> > > >
> > > > Links for reference :
> > > >
> > > >
> > >
> >
> http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/
> > > >
> > > >
> > >
> >
> http://blog.sidu.in/2007/05/tomcat-and-utf-8-encoded-uri-parameters.html#.Um_3P3Cw2X8
> > > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > On Tue, Oct 29, 2013 at 9:20 PM, Chris <christu...@gmail.com> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I get characters like -
> > > > >
> > > > > ������������������ - CTA������������ -
> > > > >
> > > > > in the solr index. I am adding Java beans to solr by the addBean()
> > > > > function.
> > > > >
> > > > > This seems to be a character encoding issue. Any pointers on how to
> > > > > resolve this one?
> > > > >
> > > > > I have seen that this occurs  mostly for japanese chinese
> characters.
> > > > >
> > > >
> > >
> >
>

Reply via email to