Thanks for the pointer , but given the same index code ,why does this not work in solr 3.6.1 but wors fine in solr 1.3
Any idea? Regards Sujatha On Tue, Jan 22, 2013 at 9:33 PM, Markus Jelsma <markus.jel...@openindex.io>wrote: > Hi, > > You've likely got some non-character code points in your data and they > need to be stripped. > > http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Noncharacter_Code_Point=True > :] > > See the patch for NUTCH-1016 for an example on how to strip them. It's > easily ported to other languages. > https://issues.apache.org/jira/browse/NUTCH-1016 > > Cheers, > > > > -----Original message----- > > From:Sujatha Arun <suja.a...@gmail.com> > > Sent: Tue 22-Jan-2013 12:35 > > To: solr-user@lucene.apache.org > > Subject: solr 3.6.1 Indexing and utf8 issue > > > > Hi, > > > > We are on solr 3.6.1 on Tomcat 5.5.25 . The Indexing of polish content > throws the following error . > > > > Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte > 0x77 (at char #166, byte #127) > > at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708) > > at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086) > > at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309) > > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156) > > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) > > ... 20 more > > Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte > 0x77 > > > > > > > > I have added a patch to enable utf-8 encoding in solrDispatchFilter.java > file > > > > The same content file in 1.3 with utf8 patch works fine .Please find > attached content file > > > > Please let me know what could be missing? > > > > Regards > > Sujatga > > >