OOM for large files

Geeta Subramanian Thu, 17 Mar 2011 11:59:35 -0700

Hi,



I am getting OOM after posting a 100 Mb document to SOLR with trace:

Exception in thread "main" org.apache.solr.common.SolrException: Java heap 
space  java.lang.OutOfMemoryError: Java heap space

                at java.util.Arrays.copyOf(Unknown Source)

                at java.lang.AbstractStringBuilder.expandCapacity(Unknown 
Source)

                at java.lang.AbstractStringBuilder.append(Unknown Source)

                at java.lang.StringBuilder.append(Unknown Source)

                at org.apache.solr.handler.extraction.Solrtik       
ContentHandler.characters(SolrContentHandler.java:257)

                at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

                at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)

                at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

                at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

                at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)

                at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)

                at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)

                at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)

                at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)

                at 
org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)

                at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)

                at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)

                at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)

                at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)

                at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)

                at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

                at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)

                at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)

                at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)

                at org.apache.solr.se







I have given 1024M memory.

But still this fails, so, can somebody tell me the minimum heap size required 
w.r.t. file size so that document get indexed successfully?



Also just a weird question:

In Tika's code, there is a place where char[] is initialized to 4096. Then when 
this used in StringWriter, if the array is full it does an expandCapacity (as 
highlighted in logs), there is an array copy operation. So with just 4kb, if I 
want to process a 100mb document, a lot of char arrays will be generated and we 
need to depend on GC for getting them cleaned.



Is there any idea, if I change the Tika code to initialize the char array with 
more than ~4k , will there be any performance improvement?



Thanks for your time,

Regards,

Geeta















******************Legal Disclaimer***************************
"This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you."
****************************************************************

OOM for large files

Reply via email to