Hi guys,

I use Manifold CF to crawl files in Windows file server and index them to
Solr using Extracting Request Handler.
Most of the documents are succesfully indexed but some are failed and Out
of Memory Error occurs in Solr, so I need some advice.

Those failed files are not so big and they are a csv file of 240MB and a
text file of 170MB.

Here is environment and machine spec:
Solr 3.6 (also Solr4.0Beta)
Tomcat 6.0
CentOS 5.6
java version 1.6.0_23
HDD 60GB
MEM 2GB
JVM Heap: -Xmx1024m -Xms1024m

I feel there is enough memory that Solr should be able to extract and index
file content.

Here is a Solr log below:
------
[solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2882)
        at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
        at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
        at java.lang.StringBuilder.append(StringBuilder.java:189)
        at
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
        at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
        at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
        at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
        at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

-----

Anyone has any ideas?

Regards,

Shigeki

Reply via email to