RE: solr 3.6.1 Indexing and utf8 issue

2013-01-22 Thread Markus Jelsma
-2013 19:25 > To: solr-user@lucene.apache.org > Subject: Re: solr 3.6.1 Indexing and utf8 issue > > Thanks for the pointer , but given the same index code ,why does this not > work in solr 3.6.1 but wors fine in solr 1.3 > > Any idea? > > Regards > Sujatha > >

Re: solr 3.6.1 Indexing and utf8 issue

2013-01-22 Thread Sujatha Arun
Thanks for the pointer , but given the same index code ,why does this not work in solr 3.6.1 but wors fine in solr 1.3 Any idea? Regards Sujatha On Tue, Jan 22, 2013 at 9:33 PM, Markus Jelsma wrote: > Hi, > > You've likely got some non-character code points in your data and they > need to be st

RE: solr 3.6.1 Indexing and utf8 issue

2013-01-22 Thread Markus Jelsma
Hi, You've likely got some non-character code points in your data and they need to be stripped. http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Noncharacter_Code_Point=True:] See the patch for NUTCH-1016 for an example on how to strip them. It's easily ported to other languages. https:/