Thanks for the pointer , but given the same index code ,why does this not
work in solr 3.6.1 but wors fine in solr 1.3

Any idea?

Regards
Sujatha

On Tue, Jan 22, 2013 at 9:33 PM, Markus Jelsma
<markus.jel...@openindex.io>wrote:

> Hi,
>
> You've likely got some non-character code points in your data and they
> need to be stripped.
>
> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Noncharacter_Code_Point=True
> :]
>
> See the patch for NUTCH-1016 for an example on how to strip them. It's
> easily ported to other languages.
> https://issues.apache.org/jira/browse/NUTCH-1016
>
> Cheers,
>
>
>
> -----Original message-----
> > From:Sujatha Arun <suja.a...@gmail.com>
> > Sent: Tue 22-Jan-2013 12:35
> > To: solr-user@lucene.apache.org
> > Subject: solr 3.6.1 Indexing and utf8 issue
> >
> > Hi,
> >
> > We are on solr 3.6.1 on  Tomcat 5.5.25 . The Indexing of polish content
> throws the following error  .
> >
> > Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte
> 0x77 (at char #166, byte #127)
> > at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
> > at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
> > at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309)
> > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
> > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
> > ... 20 more
> > Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte
> 0x77
> >
> >
> >
> > I have added a patch to enable utf-8 encoding in solrDispatchFilter.java
> file
> >
> > The same content file in 1.3 with utf8 patch works fine .Please find
> attached content file
> >
> > Please let me know what could be missing?
> >
> > Regards
> > Sujatga
> >
>

Reply via email to