And to further clarify, the issue isn't in solr-ruby, it's in REXML (a lame Ruby XML library). Both rsolr and solr-ruby will use libxml instead of REXML if it is present.
Erik On Sep 20, 2011, at 03:46 , Pranav Prakash wrote: > I managed to resolve this issue. Turns out that the issue was because of a > faulty XML file being generated by ruby-solr gem. I had to install > libxml-ruby, rsolr and I used rsolr gem instead of ruby-solr. > > Also, if you face this kind of issue, the test-utf8.sh file included in > exampledocs is a good file to test Solr's behavior towards UTF-8 chars. > > Great wok Solr team, and special thanks to Erik Hatcher. > > *Pranav Prakash* > > "temet nosce" > > Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | > Google <http://www.google.com/profiles/pranny> > > > On Mon, Sep 19, 2011 at 15:54, Pranav Prakash <pra...@gmail.com> wrote: > >> >> Just in case, someone might be intrested here is the log >> >> SEVERE: java.lang.RuntimeException: [was class >> java.io.CharConversionException] Invalid UTF-8 middle byte 0x73 (at char >> #66641, byte #65289) >> at >> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) >> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) >> at >> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) >> at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) >> at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:287) >> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:146) >> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) >> at >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) >> at >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) >> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) >> at >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) >> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) >> at >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) >> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) >> at >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) >> at >> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) >> at >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) >> at org.mortbay.jetty.Server.handle(Server.java:326) >> at >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) >> at >> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) >> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) >> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) >> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) >> at >> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) >> at >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) >> Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x73 >> (at char #66641, byte #65289) >> at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313) >> at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204) >> at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) >> at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) >> at >> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) >> at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) >> at >> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628) >> at >> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) >> at >> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) >> at >> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) >> ... 26 more >> >> >> Also, is there a setting so I can change the level of backtrace? This would >> be helpful in showing the complete stack instead of 26 more ... >> >> *Pranav Prakash* >> >> "temet nosce" >> >> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> | >> Google <http://www.google.com/profiles/pranny> >> >> >> On Mon, Sep 19, 2011 at 14:16, Pranav Prakash <pra...@gmail.com> wrote: >> >>> >>> Hi List, >>> >>> I tried Solr 3.4.0 today and while indexing I got the error >>> java.lang.RuntimeException: [was class java.io.CharConversionException] >>> Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289) >>> >>> My earlier version was Solr 1.4 and this same document went into index >>> successfully. Looking around, I see issue >>> https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the >>> issue. I thought this patch is already applied to Solr 3.4.0. Is there >>> something I am missing? >>> >>> Is there anything else I need to mention? Logs/ My document details etc.? >>> >>> *Pranav Prakash* >>> >>> "temet nosce" >>> >>> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> | >>> Google <http://www.google.com/profiles/pranny> >>> >> >>