memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Geeta Subramanian
Hi,

I am very new to SOLR and facing a lot of issues when using SOLR to push large 
documents.
I have solr running in tomcat. I have allocated about 4gb memory (-Xmx) but I 
am pushing about twenty five 100 mb documents and gives heap space and fails.

Also I tried pushing just 1 document. It went thru successfully, but the tomcat 
memory does not come down. It consumes about a gig memory for just one 100 mb 
document and does not release it.

Please let me know if I am making any mistake in configuration/ or set up.

Here is the stack trace:
SEVERE: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuffer.append(StringBuffer.java:306)
at java.io.StringWriter.write(StringWriter.java:77)
at 
com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.java:1570)
at 
com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1488)
at 
com.sun.org.apache.xml.internal.serializer.ToHTMLStream.characters(ToHTMLStream.java:1529)
at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.characters(TransformerHandlerImpl.java:168)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
at 
com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVExtractingDocumentLoader.java:349)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)


Thanks for help,
Geeta













**Legal Disclaimer***
"This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you."



Info about Debugging SOLR in Eclipse

2011-03-17 Thread Geeta Subramanian
Hi,

Can some please let me know the steps on how can I debug the solr code in my 
eclipse?

I tried to compile the source, use the jars and place in tomcat where I am 
running solr. And do remote debugging, but it did not stop at any break point.
I also tried to write a sample standalone java class to push the document. But 
I stopped at solr j classes and not solr server classes.


Please let me know if I am making any mistake.

Regards,
Geeta 













**Legal Disclaimer***
"This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you."



RE: memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Geeta Subramanian
Hi,

 Thanks for the reply.
I am sorry, the logs from where I posted does have a Custom Update Handler.

But I have a local setup, which does not have a custome update handler, its as 
its downloaded from SOLR site, even that gives me heap space.

at java.util.Arrays.copyOf(Unknown Source)  
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)   
at java.lang.AbstractStringBuilder.append(Unknown Source)   
at java.lang.StringBuilder.append(Unknown Source)   
at org.apache.solr.handler.extraction.Solrtik   
ContentHandler.characters(SolrContentHandler.java:257)  
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
 
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
   
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
 
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
 
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)   
 
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)   
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)   
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)  
 
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)
 
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)   
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)   
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)  
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) 
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)
  
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
 
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
   
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) 
 



Also, in general, if I post 25 * 100 mb docs to solr, how much should be the 
ideal heap space set?
Also, I see that when I push a single document of 100 mb, in task manager I see 
that about 900 mb memory is been used up, and some subsequent push keeps the 
memory about 900mb, so at what point there can be OOM crash?

When I ran the YourKit Profiler, I saw that around 1 gig of memory was just 
consumed by char[] , String []. 
How can I find out who is creating these(is it SOLR or TIKA) and free up these 
objects?


Thank you so much for your time and help,



Regards,
Geeta



-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 17 March, 2011 12:21 PM
To: solr-user@lucene.apache.org
Cc: Geeta Subramanian
Subject: Re: memory not getting released in tomcat after pushing large documents

On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian 
 wrote:
>        at 
> com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(
> CVExtractingDocumentLoader.java:349)

Looks like you're using a custom update handler.  Perhaps that's accidentally 
hanging onto memory?

-Yonik
http://lucidimagination.com













**Legal Disclaimer***
"This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you."



RE: Info about Debugging SOLR in Eclipse

2011-03-17 Thread Geeta Subramanian
Hi Markus,

Thanks, I had already followed the steps of this site.
But I am not able to DEBUG the SOLR classes though I am able to run the solr.

I want to see the code flow from the server side, especially the point where 
solr calls tika and it gets the content from tika.

Thanks for the time & help,
Regards,
Geeta

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: 17 March, 2011 12:22 PM
To: solr-user@lucene.apache.org
Cc: Geeta Subramanian
Subject: Re: Info about Debugging SOLR in Eclipse


http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-
eclipse



On Thursday 17 March 2011 17:17:30 Geeta Subramanian wrote:
> Hi,
> 
> Can some please let me know the steps on how can I debug the solr code 
> in my eclipse?
> 
> I tried to compile the source, use the jars and place in tomcat where 
> I am running solr. And do remote debugging, but it did not stop at any 
> break point. I also tried to write a sample standalone java class to 
> push the document. But I stopped at solr j classes and not solr server 
> classes.
> 
> 
> Please let me know if I am making any mistake.
> 
> Regards,
> Geeta
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> **Legal Disclaimer***
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient.  Any unauthorized review, 
> use or distribution by others is strictly prohibited.  If you have 
> received the message in error, please advise the sender by reply email 
> and delete the message. Thank you."
> 

--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350











**Legal Disclaimer***
"This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you."



OOM for large files

2011-03-17 Thread Geeta Subramanian
Hi,



I am getting OOM after posting a 100 Mb document to SOLR with trace:

Exception in thread "main" org.apache.solr.common.SolrException: Java heap 
space  java.lang.OutOfMemoryError: Java heap space

at java.util.Arrays.copyOf(Unknown Source)

at java.lang.AbstractStringBuilder.expandCapacity(Unknown 
Source)

at java.lang.AbstractStringBuilder.append(Unknown Source)

at java.lang.StringBuilder.append(Unknown Source)

at org.apache.solr.handler.extraction.Solrtik   
ContentHandler.characters(SolrContentHandler.java:257)

at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)

at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)

at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)

at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)

at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)

at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)

at 
org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)

at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)

at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)

at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)

at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)

at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)

at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)

at org.apache.solr.se







I have given 1024M memory.

But still this fails, so, can somebody tell me the minimum heap size required 
w.r.t. file size so that document get indexed successfully?



Also just a weird question:

In Tika's code, there is a place where char[] is initialized to 4096. Then when 
this used in StringWriter, if the array is full it does an expandCapacity (as 
highlighted in logs), there is an array copy operation. So with just 4kb, if I 
want to process a 100mb document, a lot of char arrays will be generated and we 
need to depend on GC for getting them cleaned.



Is there any idea, if I change the Tika code to initialize the char array with 
more than ~4k , will there be any performance improvement?



Thanks for your time,

Regards,

Geeta















**Legal Disclaimer***
"This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you."


RE: memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Geeta Subramanian
Hi Yonik,

I am not setting the ramBufferSizeMB or maxBufferedDocs params...
DO I need to for Indexing?

Regards,
Geeta

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 17 March, 2011 3:45 PM
To: Geeta Subramanian
Cc: solr-user@lucene.apache.org
Subject: Re: memory not getting released in tomcat after pushing large documents

In your solrconfig.xml,
Are you specifying ramBufferSizeMB or maxBufferedDocs?

-Yonik
http://lucidimagination.com


On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian 
 wrote:
> Hi,
>
>  Thanks for the reply.
> I am sorry, the logs from where I posted does have a Custom Update Handler.
>
> But I have a local setup, which does not have a custome update handler, its 
> as its downloaded from SOLR site, even that gives me heap space.
>
> at java.util.Arrays.copyOf(Unknown Source)
>        at java.lang.AbstractStringBuilder.expandCapacity(Unknown 
> Source)
>        at java.lang.AbstractStringBuilder.append(Unknown Source)
>        at java.lang.StringBuilder.append(Unknown Source)
>        at org.apache.solr.handler.extraction.Solrtik   
> ContentHandler.characters(SolrContentHandler.java:257)
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
> ecorator.java:124)
>        at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandl
> er.java:153)
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
> ecorator.java:124)
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
> ecorator.java:124)
>        at 
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.j
> ava:39)
>        at 
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java
> :61)
>        at 
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:
> 113)
>        at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.j
> ava:151)
>        at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler
> .java:175)
>        at 
> org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
>        at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99
> )
>        at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:11
> 2)
>        at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extra
> ctingDocumentLoader.java:193)
>        at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con
> tentStreamHandlerBase.java:54)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:131)
>        at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleR
> equest(RequestHandlers.java:237)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> java:337)
>
>
>
> Also, in general, if I post 25 * 100 mb docs to solr, how much should be the 
> ideal heap space set?
> Also, I see that when I push a single document of 100 mb, in task manager I 
> see that about 900 mb memory is been used up, and some subsequent push keeps 
> the memory about 900mb, so at what point there can be OOM crash?
>
> When I ran the YourKit Profiler, I saw that around 1 gig of memory was just 
> consumed by char[] , String [].
> How can I find out who is creating these(is it SOLR or TIKA) and free up 
> these objects?
>
>
> Thank you so much for your time and help,
>
>
>
> Regards,
> Geeta
>
>
>
> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik 
> Seeley
> Sent: 17 March, 2011 12:21 PM
> To: solr-user@lucene.apache.org
> Cc: Geeta Subramanian
> Subject: Re: memory not getting released in tomcat after pushing large 
> documents
>
> On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian 
>  wrote:
>>        at
>> com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load
>> (
>> CVExtractingDocumentLoader.java:349)
>
> Looks like you're using a custom update handler.  Perhaps that's accidentally 
> hanging onto memory?
>
> -Yonik
> http://lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>
>
> **Legal Disclaimer***
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient.  Any unauthorized review, 
> use or distribut

RE: Info about Debugging SOLR in Eclipse

2011-03-17 Thread Geeta Subramanian
Hi All,

Thanks for the help... I am now able to debug my solr. :-)

-Original Message-
From: pkeegan01...@gmail.com [mailto:pkeegan01...@gmail.com] On Behalf Of Peter 
Keegan
Sent: 17 March, 2011 3:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Info about Debugging SOLR in Eclipse

The instructions refer to the 'Run configuration' menu. Did you try 'Debug 
configurations'?


On Thu, Mar 17, 2011 at 3:27 PM, Peter Keegan wrote:

> Can you use jetty?
>
>
> http://www.lucidimagination.com/developers/articles/setting-up-apache-
> solr-in-eclipse
>
> On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian < 
> gsubraman...@commvault.com> wrote:
>
>> Hi,
>>
>> Can some please let me know the steps on how can I debug the solr 
>> code in my eclipse?
>>
>> I tried to compile the source, use the jars and place in tomcat where 
>> I am running solr. And do remote debugging, but it did not stop at 
>> any break point.
>> I also tried to write a sample standalone java class to push the document.
>> But I stopped at solr j classes and not solr server classes.
>>
>>
>> Please let me know if I am making any mistake.
>>
>> Regards,
>> Geeta
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> **Legal Disclaimer***
>> "This communication may contain confidential and privileged material 
>> for the sole use of the intended recipient.  Any unauthorized review, 
>> use or distribution by others is strictly prohibited.  If you have 
>> received the message in error, please advise the sender by reply 
>> email and delete the message. Thank you."
>> 
>>
>
>











**Legal Disclaimer***
"This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you."



Location of Solr Logs

2011-04-28 Thread Geeta Subramanian
Hi,

I am newbee to SOLR.
Can you please help me to know where can see the logs written by SOLR?
Is there any configuration required to see the logs of SOLR?

Thanks for your time and help,
Geeta
**Legal Disclaimer***
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you."
*