memory not getting released in tomcat after pushing large documents
Hi, I am very new to SOLR and facing a lot of issues when using SOLR to push large documents. I have solr running in tomcat. I have allocated about 4gb memory (-Xmx) but I am pushing about twenty five 100 mb documents and gives heap space and fails. Also I tried pushing just 1 document. It went thru successfully, but the tomcat memory does not come down. It consumes about a gig memory for just one 100 mb document and does not release it. Please let me know if I am making any mistake in configuration/ or set up. Here is the stack trace: SEVERE: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515) at java.lang.StringBuffer.append(StringBuffer.java:306) at java.io.StringWriter.write(StringWriter.java:77) at com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.java:1570) at com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1488) at com.sun.org.apache.xml.internal.serializer.ToHTMLStream.characters(ToHTMLStream.java:1529) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.characters(TransformerHandlerImpl.java:168) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVExtractingDocumentLoader.java:349) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122) Thanks for help, Geeta **Legal Disclaimer*** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you."
Info about Debugging SOLR in Eclipse
Hi, Can some please let me know the steps on how can I debug the solr code in my eclipse? I tried to compile the source, use the jars and place in tomcat where I am running solr. And do remote debugging, but it did not stop at any break point. I also tried to write a sample standalone java class to push the document. But I stopped at solr j classes and not solr server classes. Please let me know if I am making any mistake. Regards, Geeta **Legal Disclaimer*** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you."
RE: memory not getting released in tomcat after pushing large documents
Hi, Thanks for the reply. I am sorry, the logs from where I posted does have a Custom Update Handler. But I have a local setup, which does not have a custome update handler, its as its downloaded from SOLR site, even that gives me heap space. at java.util.Arrays.copyOf(Unknown Source) at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source) at java.lang.AbstractStringBuilder.append(Unknown Source) at java.lang.StringBuilder.append(Unknown Source) at org.apache.solr.handler.extraction.Solrtik ContentHandler.characters(SolrContentHandler.java:257) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) Also, in general, if I post 25 * 100 mb docs to solr, how much should be the ideal heap space set? Also, I see that when I push a single document of 100 mb, in task manager I see that about 900 mb memory is been used up, and some subsequent push keeps the memory about 900mb, so at what point there can be OOM crash? When I ran the YourKit Profiler, I saw that around 1 gig of memory was just consumed by char[] , String []. How can I find out who is creating these(is it SOLR or TIKA) and free up these objects? Thank you so much for your time and help, Regards, Geeta -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 17 March, 2011 12:21 PM To: solr-user@lucene.apache.org Cc: Geeta Subramanian Subject: Re: memory not getting released in tomcat after pushing large documents On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian wrote: > at > com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load( > CVExtractingDocumentLoader.java:349) Looks like you're using a custom update handler. Perhaps that's accidentally hanging onto memory? -Yonik http://lucidimagination.com **Legal Disclaimer*** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you."
RE: Info about Debugging SOLR in Eclipse
Hi Markus, Thanks, I had already followed the steps of this site. But I am not able to DEBUG the SOLR classes though I am able to run the solr. I want to see the code flow from the server side, especially the point where solr calls tika and it gets the content from tika. Thanks for the time & help, Regards, Geeta -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: 17 March, 2011 12:22 PM To: solr-user@lucene.apache.org Cc: Geeta Subramanian Subject: Re: Info about Debugging SOLR in Eclipse http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in- eclipse On Thursday 17 March 2011 17:17:30 Geeta Subramanian wrote: > Hi, > > Can some please let me know the steps on how can I debug the solr code > in my eclipse? > > I tried to compile the source, use the jars and place in tomcat where > I am running solr. And do remote debugging, but it did not stop at any > break point. I also tried to write a sample standalone java class to > push the document. But I stopped at solr j classes and not solr server > classes. > > > Please let me know if I am making any mistake. > > Regards, > Geeta > > > > > > > > > > > > > > **Legal Disclaimer*** > "This communication may contain confidential and privileged material > for the sole use of the intended recipient. Any unauthorized review, > use or distribution by others is strictly prohibited. If you have > received the message in error, please advise the sender by reply email > and delete the message. Thank you." > -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 **Legal Disclaimer*** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you."
OOM for large files
Hi, I am getting OOM after posting a 100 Mb document to SOLR with trace: Exception in thread "main" org.apache.solr.common.SolrException: Java heap space java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Unknown Source) at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source) at java.lang.AbstractStringBuilder.append(Unknown Source) at java.lang.StringBuilder.append(Unknown Source) at org.apache.solr.handler.extraction.Solrtik ContentHandler.characters(SolrContentHandler.java:257) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.se I have given 1024M memory. But still this fails, so, can somebody tell me the minimum heap size required w.r.t. file size so that document get indexed successfully? Also just a weird question: In Tika's code, there is a place where char[] is initialized to 4096. Then when this used in StringWriter, if the array is full it does an expandCapacity (as highlighted in logs), there is an array copy operation. So with just 4kb, if I want to process a 100mb document, a lot of char arrays will be generated and we need to depend on GC for getting them cleaned. Is there any idea, if I change the Tika code to initialize the char array with more than ~4k , will there be any performance improvement? Thanks for your time, Regards, Geeta **Legal Disclaimer*** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you."
RE: memory not getting released in tomcat after pushing large documents
Hi Yonik, I am not setting the ramBufferSizeMB or maxBufferedDocs params... DO I need to for Indexing? Regards, Geeta -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 17 March, 2011 3:45 PM To: Geeta Subramanian Cc: solr-user@lucene.apache.org Subject: Re: memory not getting released in tomcat after pushing large documents In your solrconfig.xml, Are you specifying ramBufferSizeMB or maxBufferedDocs? -Yonik http://lucidimagination.com On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian wrote: > Hi, > > Thanks for the reply. > I am sorry, the logs from where I posted does have a Custom Update Handler. > > But I have a local setup, which does not have a custome update handler, its > as its downloaded from SOLR site, even that gives me heap space. > > at java.util.Arrays.copyOf(Unknown Source) > at java.lang.AbstractStringBuilder.expandCapacity(Unknown > Source) > at java.lang.AbstractStringBuilder.append(Unknown Source) > at java.lang.StringBuilder.append(Unknown Source) > at org.apache.solr.handler.extraction.Solrtik > ContentHandler.characters(SolrContentHandler.java:257) > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD > ecorator.java:124) > at > org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandl > er.java:153) > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD > ecorator.java:124) > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD > ecorator.java:124) > at > org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.j > ava:39) > at > org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java > :61) > at > org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java: > 113) > at > org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.j > ava:151) > at > org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler > .java:175) > at > org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99 > ) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:11 > 2) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extra > ctingDocumentLoader.java:193) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con > tentStreamHandlerBase.java:54) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle > rBase.java:131) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleR > equest(RequestHandlers.java:237) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter. > java:337) > > > > Also, in general, if I post 25 * 100 mb docs to solr, how much should be the > ideal heap space set? > Also, I see that when I push a single document of 100 mb, in task manager I > see that about 900 mb memory is been used up, and some subsequent push keeps > the memory about 900mb, so at what point there can be OOM crash? > > When I ran the YourKit Profiler, I saw that around 1 gig of memory was just > consumed by char[] , String []. > How can I find out who is creating these(is it SOLR or TIKA) and free up > these objects? > > > Thank you so much for your time and help, > > > > Regards, > Geeta > > > > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > Seeley > Sent: 17 March, 2011 12:21 PM > To: solr-user@lucene.apache.org > Cc: Geeta Subramanian > Subject: Re: memory not getting released in tomcat after pushing large > documents > > On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian > wrote: >> at >> com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load >> ( >> CVExtractingDocumentLoader.java:349) > > Looks like you're using a custom update handler. Perhaps that's accidentally > hanging onto memory? > > -Yonik > http://lucidimagination.com > > > > > > > > > > > > > > **Legal Disclaimer*** > "This communication may contain confidential and privileged material > for the sole use of the intended recipient. Any unauthorized review, > use or distribut
RE: Info about Debugging SOLR in Eclipse
Hi All, Thanks for the help... I am now able to debug my solr. :-) -Original Message- From: pkeegan01...@gmail.com [mailto:pkeegan01...@gmail.com] On Behalf Of Peter Keegan Sent: 17 March, 2011 3:33 PM To: solr-user@lucene.apache.org Subject: Re: Info about Debugging SOLR in Eclipse The instructions refer to the 'Run configuration' menu. Did you try 'Debug configurations'? On Thu, Mar 17, 2011 at 3:27 PM, Peter Keegan wrote: > Can you use jetty? > > > http://www.lucidimagination.com/developers/articles/setting-up-apache- > solr-in-eclipse > > On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian < > gsubraman...@commvault.com> wrote: > >> Hi, >> >> Can some please let me know the steps on how can I debug the solr >> code in my eclipse? >> >> I tried to compile the source, use the jars and place in tomcat where >> I am running solr. And do remote debugging, but it did not stop at >> any break point. >> I also tried to write a sample standalone java class to push the document. >> But I stopped at solr j classes and not solr server classes. >> >> >> Please let me know if I am making any mistake. >> >> Regards, >> Geeta >> >> >> >> >> >> >> >> >> >> >> >> >> >> **Legal Disclaimer*** >> "This communication may contain confidential and privileged material >> for the sole use of the intended recipient. Any unauthorized review, >> use or distribution by others is strictly prohibited. If you have >> received the message in error, please advise the sender by reply >> email and delete the message. Thank you." >> >> > > **Legal Disclaimer*** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you."
Location of Solr Logs
Hi, I am newbee to SOLR. Can you please help me to know where can see the logs written by SOLR? Is there any configuration required to see the logs of SOLR? Thanks for your time and help, Geeta **Legal Disclaimer*** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you." *