Hi, I am using Solr 1.4.
I have an issue with Solr indexing large PDF files (> 5MB but < 10MB). I have set the: <requestParsers enableRemoteStreaming=3D"false" multipartUploadLimitInKB= =3D"10480" /> properties in solrconfig.xml. The exception I get is: SEVERE: org.apache.solr.common.SolrException: org.apache.tika.exception.Tik= aException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.P= dfpar...@7308c5c at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load= (ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBo= dy(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(Request= HandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchF= ilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatch= Filter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter= (ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(Applica= tionFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWra= pperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardCon= textValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostVa= lve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportVa= lve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngi= neValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapte= r.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor= .java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.= process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEn= dpoint.java:361) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoo= lExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExe= cutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOExc= eption from org.apache.tika.parser.pdf.pdfpar...@7308c5c at org.apache.tika.parser.CompositeParser.parse(CompositeParser.jav= a:125) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.j= ava:105) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load= (ExtractingDocumentLoader.java:190) ... 19 more Caused by: java.io.IOException: expected true actual=3D'tr' org.pdfbox.io.P= ushbackinputstr...@7b0d5e0d at org.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:8= 99) at org.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:68= 0) at org.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:8= 69) at org.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BasePars= er.java:150) at org.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.ja= va:206) at org.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:8= 58) at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:448) but if I use the PDFParser directly, it parses fine. It seems like the input stream is being closed prematurely but that is proving difficult to track down. Any ideas? Stuart.=20