Hi,

I'm not sure which version of Solr/Tika you're using but I had a similar
experience which turned out to be the result of a design change to PDFBox.

https://issues.apache.org/jira/browse/SOLR-2886

Tricia

On Sat, Jan 14, 2012 at 12:53 AM, Wayne W <waynemailingli...@gmail.com>wrote:

> Hi,
>
> we're using Solr running on tomcat with 1GB in production, and of late
> we've been having a huge number of OutOfMemory issues. It seems from
> what I can tell this is coming from the tika extraction of the
> content. I've processed the java dump file using a memory analyzer and
> its pretty clean at least the class involved. It seems like a leak to
> me, as we don't parse any files larger than 20M, and these objects are
> taking up ~700M
>
> I've attached 2 screen shots from the tool (not sure if you receive
> attachments).
>
> But to summarize (class, number of objects, Used heap size, Retained Heap
> Size):
>
>
> org.apache.xmlbeans.impl.store.Xob$ElementXObj             838,993
>         80,533,728       604,606,040
> org.apache.poi.openxml4j.opc.ZipPackage                          2
>                   112                  87,009,848
> char[]
>              587                    32,216,960       38,216,950
>
>
> We're really desperate to find a solution to this - any ideas or help
> is greatly appreciated.
> Wayne
>

Reply via email to