Hi,

If you like, you can open a JIRA issue on this and provide as much info as 
possible. Someone can then look into (potential) memory optimization of this 
part of the code.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

28. sep. 2012 kl. 03:42 skrev Shigeki Kobayashi 
<shigeki.kobayas...@g.softbank.co.jp>:

> Hi Jan.
> 
> Thank you very much for your advice.
> 
> So I understand Solr needs more memory to parse the files.
> To parse a file of size x,  it needs double memory (2x). Then how much
> memory allocation should be taken to heap size? 8x? 16x?
> 
> Regards,
> 
> 
> Shigeki
> 
> 2012/9/28 Jan Høydahl <jan....@cominvent.com>
> 
>> Please try to increase -Xmx and see how much RAM you need for it to
>> succeed.
>> 
>> I believe it is simply a case where this particular file needs double
>> memory (480Mb) to parse and you have only allocated 1Gb (which is not
>> particularly much). Perhaps the code could be optimized to avoid the
>> Arrays.copyOf() call..
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> 27. sep. 2012 kl. 11:22 skrev Shigeki Kobayashi <
>> shigeki.kobayas...@g.softbank.co.jp>:
>> 
>>> Hi guys,
>>> 
>>> 
>>> I use Manifold CF to crawl files in Windows file server and index them to
>>> Solr using Extracting Request Handler.
>>> Most of the documents are succesfully indexed but some are failed and Out
>>> of Memory Error occurs in Solr, so I need some advice.
>>> 
>>> Those failed files are not so big and they are a csv file of 240MB and a
>>> text file of 170MB.
>>> 
>>> Here is environment and machine spec:
>>> Solr 3.6 (also Solr4.0Beta)
>>> Tomcat 6.0
>>> CentOS 5.6
>>> java version 1.6.0_23
>>> HDD 60GB
>>> MEM 2GB
>>> JVM Heap: -Xmx1024m -Xms1024m
>>> 
>>> I feel there is enough memory that Solr should be able to extract and
>> index
>>> file content.
>>> 
>>> Here is a Solr log below:
>>> ------
>>> 
>> [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
>>> Java heap space
>>>       at java.util.Arrays.copyOf(Arrays.java:2882)
>>>       at
>>> 
>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>>>       at
>>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
>>>       at java.lang.StringBuilder.append(StringBuilder.java:189)
>>>       at
>>> 
>> org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
>>>       at
>>> 
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>>       at
>>> 
>> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>>>       at
>>> 
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>>       at
>>> 
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>>       at
>>> 
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>>       at
>>> 
>> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>>>       at
>>> 
>> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>>>       at
>>> 
>> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>>>       at
>>> 
>> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>>>       at
>>> 
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
>>>       at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
>>>       at
>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>>>       at
>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>>>       at
>>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>>>       at
>>> 
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>>>       at
>>> 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
>>>       at
>>> 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>>       at
>>> 
>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
>>>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
>>>       at
>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
>>>       at
>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
>>>       at
>>> 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>       at
>>> 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>       at
>>> 
>> filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
>>>       at
>>> 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>       at
>>> 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>       at
>>> 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>> 
>>> -----
>>> 
>>> Anyone has any ideas?
>>> 
>>> Regards,
>>> 
>>> Shigeki
>> 
>> 

Reply via email to