Hi, I had problem with indexing documents some months ago as well. I found that there were XML control characters in the documents and these were not handled by Solr. Maybe it is the case for you as well.
Regards, Georg On Sun, Mar 21, 2010 at 5:58 PM, Ross <tetr...@gmail.com> wrote: > Hi all > > I'm trying to import some text files. I'm mostly following Avi > Rappoport's tutorial. Some of my files cause Solr to crash while > indexing. I've narrowed it down to a very simple example. > > I have a file named test.txt with one line. That line is the word > XXBLE and nothing else > > This is the command I'm using. > > curl " > http://localhost:8080/solr-example/update/extract?literal.id=1&commit=true > " > -F "myfi...@test.txt" > > The result is pasted below. Other files work just fine. The problem > seems to be related to the letters B and E. If I change them to > something else or make them lower case then it works. In my real > files, the XX is something else but the result is the same. It's a > common word in the files. I guess for this "quick and dirty" job I'm > doing I could do a bulk replace in the files to make it lower case. > > Is there any workaround for this? > > Thanks > Ross > > <html><head><title>Apache Tomcat/6.0.20 - Error > report</title><style><!--H1 > > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} > H2 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} > H3 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} > BODY > {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} > B > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} > P > {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A > {color : black;}A.name {color : black;}HR {color : > #525D76;}--></style> </head><body><h1>HTTP Status 500 - > org.apache.tika.exception.TikaException: Unexpected RuntimeException > from org.apache.tika.parser.txt.txtpar...@19ccba > > org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unexpected RuntimeException > from org.apache.tika.parser.txt.txtpar...@19ccba > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) > at java.lang.Thread.run(Thread.java:636) > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) > ... 18 more > Caused by: java.lang.NullPointerException > at java.io.Reader.<init>(Reader.java:78) > at java.io.BufferedReader.<init>(BufferedReader.java:93) > at java.io.BufferedReader.<init>(BufferedReader.java:108) > at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) > ... 20 more > </h1><HR size="1" noshade="noshade"><p><b>type</b> Status > report</p><p><b>message</b> > <u>org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba > > org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unexpected RuntimeException > from org.apache.tika.parser.txt.txtpar...@19ccba > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) > at java.lang.Thread.run(Thread.java:636) > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) > ... 18 more > Caused by: java.lang.NullPointerException > at java.io.Reader.<init>(Reader.java:78) > at java.io.BufferedReader.<init>(BufferedReader.java:93) > at java.io.BufferedReader.<init>(BufferedReader.java:108) > at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) > ... 20 more > </u></p><p><b>description</b> <u>The server encountered an internal > error (org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba > > org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unexpected RuntimeException > from org.apache.tika.parser.txt.txtpar...@19ccba > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) > at java.lang.Thread.run(Thread.java:636) > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) > ... 18 more > Caused by: java.lang.NullPointerException > at java.io.Reader.<init>(Reader.java:78) > at java.io.BufferedReader.<init>(BufferedReader.java:93) > at java.io.BufferedReader.<init>(BufferedReader.java:108) > at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) > ... 20 more > ) that prevented it from fulfilling this request.</u></p><HR size="1" > noshade="noshade"><h3>Apache Tomcat/6.0.20</h3></body></html> >