[jira] [Commented] (TIKA-3097) Out of memory while parsing docx

Tim Allison (Jira) Tue, 09 Jun 2020 10:00:07 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129624#comment-17129624
 ]


Tim Allison commented on TIKA-3097:
-----------------------------------

Java will take as much heap as it can use.  If this is a long running process, 
tika will take as much memory as you let it.

I'm able to parse the above .docx file with SAX DOCX configured with -Xmx40m 
and I get an OOM with -Xmx32m.

If I don't use the SAX DOCX, I can get an OOM at -Xmx1g.

Is there a chance that the configuration of SAX is not making it to your parser?


> Out of memory while parsing docx
> --------------------------------
>
>                 Key: TIKA-3097
>                 URL: https://issues.apache.org/jira/browse/TIKA-3097
>             Project: Tika
>          Issue Type: Bug
>          Components: core, parser
>    Affects Versions: 1.24
>            Reporter: suchendra
>            Priority: Major
>         Attachments: Screenshot from 2020-05-07 08-14-25.png, samplefile.txt, 
> test.docx
>
>
> I have written simple Scala code to extract the content from uploaded file 
> which is docx. JVM goes OOM when tika tries to parse the file. I have 
> configured JVM heap to 1GB and tried with 2GB same issue occurs, issue both 
> with jar as well as in my code.
> Attached the file for reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TIKA-3097) Out of memory while parsing docx

Reply via email to