[
https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129624#comment-17129624
]
Tim Allison commented on TIKA-3097:
-----------------------------------
Java will take as much heap as it can use. If this is a long running process,
tika will take as much memory as you let it.
I'm able to parse the above .docx file with SAX DOCX configured with -Xmx40m
and I get an OOM with -Xmx32m.
If I don't use the SAX DOCX, I can get an OOM at -Xmx1g.
Is there a chance that the configuration of SAX is not making it to your parser?
> Out of memory while parsing docx
> --------------------------------
>
> Key: TIKA-3097
> URL: https://issues.apache.org/jira/browse/TIKA-3097
> Project: Tika
> Issue Type: Bug
> Components: core, parser
> Affects Versions: 1.24
> Reporter: suchendra
> Priority: Major
> Attachments: Screenshot from 2020-05-07 08-14-25.png, samplefile.txt,
> test.docx
>
>
> I have written simple Scala code to extract the content from uploaded file
> which is docx. JVM goes OOM when tika tries to parse the file. I have
> configured JVM heap to 1GB and tried with 2GB same issue occurs, issue both
> with jar as well as in my code.
> Attached the file for reference.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)