And to further clarify, the issue isn't in solr-ruby, it's in REXML (a lame 
Ruby XML library).  Both rsolr and solr-ruby will use libxml instead of REXML 
if it is present.

        Erik

On Sep 20, 2011, at 03:46 , Pranav Prakash wrote:

> I managed to resolve this issue. Turns out that the issue was because of a
> faulty XML file being generated by ruby-solr gem. I had to install
> libxml-ruby, rsolr and I used rsolr gem instead of ruby-solr.
> 
> Also, if you face this kind of issue, the test-utf8.sh file included in
> exampledocs is a good file to test Solr's behavior towards UTF-8 chars.
> 
> Great wok Solr team, and special thanks to Erik Hatcher.
> 
> *Pranav Prakash*
> 
> "temet nosce"
> 
> Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
> Google <http://www.google.com/profiles/pranny>
> 
> 
> On Mon, Sep 19, 2011 at 15:54, Pranav Prakash <pra...@gmail.com> wrote:
> 
>> 
>> Just in case, someone might be intrested here is the log
>> 
>> SEVERE: java.lang.RuntimeException: [was class
>> java.io.CharConversionException] Invalid UTF-8 middle byte 0x73 (at char
>> #66641, byte #65289)
>> at
>> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
>> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
>> at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>> at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:287)
>> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:146)
>> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
>> at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>> at
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>> at
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>> at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>> at
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>> at
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>> at
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>> at org.mortbay.jetty.Server.handle(Server.java:326)
>> at
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>> at
>> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
>> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
>> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
>> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>> at
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>> at
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>> Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x73
>> (at char #66641, byte #65289)
>> at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313)
>> at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204)
>> at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
>> at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
>> at
>> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
>> at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
>> ... 26 more
>> 
>> 
>> Also, is there a setting so I can change the level of backtrace? This would
>> be helpful in showing the complete stack instead of 26 more ...
>> 
>> *Pranav Prakash*
>> 
>> "temet nosce"
>> 
>> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> |
>> Google <http://www.google.com/profiles/pranny>
>> 
>> 
>> On Mon, Sep 19, 2011 at 14:16, Pranav Prakash <pra...@gmail.com> wrote:
>> 
>>> 
>>> Hi List,
>>> 
>>> I tried Solr 3.4.0 today and while indexing I got the error
>>> java.lang.RuntimeException: [was class java.io.CharConversionException]
>>> Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289)
>>> 
>>> My earlier version was Solr 1.4 and this same document went into index
>>> successfully. Looking around, I see issue
>>> https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the
>>> issue. I thought this patch is already applied to Solr 3.4.0. Is there
>>> something I am missing?
>>> 
>>> Is there anything else I need to mention? Logs/ My document details etc.?
>>> 
>>> *Pranav Prakash*
>>> 
>>> "temet nosce"
>>> 
>>> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> |
>>> Google <http://www.google.com/profiles/pranny>
>>> 
>> 
>> 

Reply via email to