Hi, If you like, you can open a JIRA issue on this and provide as much info as possible. Someone can then look into (potential) memory optimization of this part of the code.
-- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 28. sep. 2012 kl. 03:42 skrev Shigeki Kobayashi <shigeki.kobayas...@g.softbank.co.jp>: > Hi Jan. > > Thank you very much for your advice. > > So I understand Solr needs more memory to parse the files. > To parse a file of size x, it needs double memory (2x). Then how much > memory allocation should be taken to heap size? 8x? 16x? > > Regards, > > > Shigeki > > 2012/9/28 Jan Høydahl <jan....@cominvent.com> > >> Please try to increase -Xmx and see how much RAM you need for it to >> succeed. >> >> I believe it is simply a case where this particular file needs double >> memory (480Mb) to parse and you have only allocated 1Gb (which is not >> particularly much). Perhaps the code could be optimized to avoid the >> Arrays.copyOf() call.. >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> Solr Training - www.solrtraining.com >> >> 27. sep. 2012 kl. 11:22 skrev Shigeki Kobayashi < >> shigeki.kobayas...@g.softbank.co.jp>: >> >>> Hi guys, >>> >>> >>> I use Manifold CF to crawl files in Windows file server and index them to >>> Solr using Extracting Request Handler. >>> Most of the documents are succesfully indexed but some are failed and Out >>> of Memory Error occurs in Solr, so I need some advice. >>> >>> Those failed files are not so big and they are a csv file of 240MB and a >>> text file of 170MB. >>> >>> Here is environment and machine spec: >>> Solr 3.6 (also Solr4.0Beta) >>> Tomcat 6.0 >>> CentOS 5.6 >>> java version 1.6.0_23 >>> HDD 60GB >>> MEM 2GB >>> JVM Heap: -Xmx1024m -Xms1024m >>> >>> I feel there is enough memory that Solr should be able to extract and >> index >>> file content. >>> >>> Here is a Solr log below: >>> ------ >>> >> [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError: >>> Java heap space >>> at java.util.Arrays.copyOf(Arrays.java:2882) >>> at >>> >> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) >>> at >>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515) >>> at java.lang.StringBuilder.append(StringBuilder.java:189) >>> at >>> >> org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293) >>> at >>> >> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) >>> at >>> >> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) >>> at >>> >> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) >>> at >>> >> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) >>> at >>> >> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) >>> at >>> >> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) >>> at >>> >> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) >>> at >>> >> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) >>> at >>> >> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) >>> at >>> >> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268) >>> at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134) >>> at >>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) >>> at >>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) >>> at >>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) >>> at >>> >> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) >>> at >>> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) >>> at >>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) >>> at >>> >> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244) >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) >>> at >>> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) >>> at >>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) >>> at >>> >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >>> at >>> >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> at >>> >> filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122) >>> at >>> >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >>> at >>> >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> at >>> >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >>> >>> ----- >>> >>> Anyone has any ideas? >>> >>> Regards, >>> >>> Shigeki >> >>