Hi, it's possible older XML dependencies did not correctly check the contents 
of the data properly, maybe something else. But your XML is certainly broken. 
Stripping bad chars will most likely fix your problem.
 
 
-----Original message-----
> From:Sujatha Arun <suja.a...@gmail.com>
> Sent: Tue 22-Jan-2013 19:25
> To: solr-user@lucene.apache.org
> Subject: Re: solr 3.6.1 Indexing and utf8 issue
> 
> Thanks for the pointer , but given the same index code ,why does this not
> work in solr 3.6.1 but wors fine in solr 1.3
> 
> Any idea?
> 
> Regards
> Sujatha
> 
> On Tue, Jan 22, 2013 at 9:33 PM, Markus Jelsma
> <markus.jel...@openindex.io>wrote:
> 
> > Hi,
> >
> > You've likely got some non-character code points in your data and they
> > need to be stripped.
> >
> > http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Noncharacter_Code_Point=True
> > :]
> >
> > See the patch for NUTCH-1016 for an example on how to strip them. It's
> > easily ported to other languages.
> > https://issues.apache.org/jira/browse/NUTCH-1016
> >
> > Cheers,
> >
> >
> >
> > -----Original message-----
> > > From:Sujatha Arun <suja.a...@gmail.com>
> > > Sent: Tue 22-Jan-2013 12:35
> > > To: solr-user@lucene.apache.org
> > > Subject: solr 3.6.1 Indexing and utf8 issue
> > >
> > > Hi,
> > >
> > > We are on solr 3.6.1 on  Tomcat 5.5.25 . The Indexing of polish content
> > throws the following error  .
> > >
> > > Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte
> > 0x77 (at char #166, byte #127)
> > > at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
> > > at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
> > > at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309)
> > > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
> > > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
> > > ... 20 more
> > > Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte
> > 0x77
> > >
> > >
> > >
> > > I have added a patch to enable utf-8 encoding in solrDispatchFilter.java
> > file
> > >
> > > The same content file in 1.3 with utf8 patch works fine .Please find
> > attached content file
> > >
> > > Please let me know what could be missing?
> > >
> > > Regards
> > > Sujatga
> > >
> >
> 

Reply via email to