Dave You may want to break large docs into chunks, say by chapter or other logical segment.
This will help in - relevance ranking - the term frequency of large docs will cause uneven weighting unless the relevance calculation does log normalization - finer granularity of retrieval - for example a dictionary, thesaurus, and Encyclopedia probably have what you want, but how to get it quickly? - post-processing - like high-lighting, can be a performance killer, as the search/replace scans the entire large file for matching strings Jon -----Original Message----- From: David Thibault [mailto:[EMAIL PROTECTED] Sent: Thursday, February 21, 2008 7:58 PM To: solr-user@lucene.apache.org Subject: Re: Indexing very large files. All, A while back I was running into an issue with a Java heap out of memory error while indexing large files. I figured out that was my own error due to a misconfiguration of my Netbeans memory settings. However, now that is fixed and I have stumbled upon a new error. When trying to upload files which include a Solr TextField value of 32MB or more in size, I get the following error (uploading with SimplePostTool): Solr returned an error: error reading input, returned 0 javax.xml.stream.XMLStreamException: error reading input, returned 0 at com.bea.xml.stream.MXParser.fillBuf(MXParser.java:3709) at com.bea.xml.stream.MXParser.more(MXParser.java:3715) at com.bea.xml.stream.MXParser.nextImpl(MXParser.java:1936) at com.bea.xml.stream.MXParser.next(MXParser.java:1333) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc( XmlUpdateRequestHandler.java:318) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate( XmlUpdateRequestHandler.java:195) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody( XmlUpdateRequestHandler.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:117) at org.apache.solr.core.SolrCore.execute( SolrCore.java:902) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:280) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 237) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:613) I suspect there's a setting somewhere that I'm overlooking that is causing this, but after peering through the solrconfig.xml and schema.xml files I am not seeing anything obvious (to me, anyway...=). The second line of the error shows it's crashing in MXParser.fillBuf, which implies that I'm overloading the buffer (I assume due to too large of a string). Thanks in advance for any assistance, Dave