[
https://issues.apache.org/jira/browse/TIKA-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069864#comment-18069864
]
ASF GitHub Bot commented on TIKA-4704:
--------------------------------------
Copilot commented on code in PR #2726:
URL: https://github.com/apache/tika/pull/2726#discussion_r3014869182
##########
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java:
##########
@@ -131,47 +132,49 @@ public void parse(InputStream stream, ContentHandler
baseHandler, Metadata metad
// Use a File if we can, and an already open zip is even better
ZipFile zipFile = null;
TikaInputStream tmpTis = null;
- if (stream instanceof TikaInputStream) {
- TikaInputStream tis = (TikaInputStream) stream;
- Object container = ((TikaInputStream) stream).getOpenContainer();
- if (container instanceof ZipFile) {
- zipFile = (ZipFile) container;
+ try (TemporaryResources tmp = new TemporaryResources()) {
+ if (stream instanceof TikaInputStream) {
+ TikaInputStream tis = (TikaInputStream) stream;
+ Object container = ((TikaInputStream)
stream).getOpenContainer();
+ if (container instanceof ZipFile) {
+ zipFile = (ZipFile) container;
+ } else {
+ zipFile = new ZipFile(tis.getFile());
+ tis.setOpenContainer(zipFile);
+ }
} else {
- zipFile = new ZipFile(tis.getFile());
- tis.setOpenContainer(zipFile);
+ tmpTis = TikaInputStream.get(stream, tmp, metadata);
+ tmpTis.setOpenContainer(new ZipFile(tmpTis.getFile()));
+ zipFile = (ZipFile) tmpTis.getOpenContainer();
}
- } else {
- tmpTis = TikaInputStream.get(stream);
- tmpTis.setOpenContainer(new ZipFile(tmpTis.getFile()));
- zipFile = (ZipFile) tmpTis.getOpenContainer();
- }
- // Prepare to handle the content
- XHTMLContentHandler xhtml = new XHTMLContentHandler(baseHandler,
metadata);
- xhtml.startDocument();
- // As we don't know which of the metadata or the content
- // we'll hit first, catch the endDocument call initially
- EndDocumentShieldingContentHandler handler = new
EndDocumentShieldingContentHandler(xhtml);
+ // Prepare to handle the content
+ XHTMLContentHandler xhtml = new XHTMLContentHandler(baseHandler,
metadata);
+ xhtml.startDocument();
+ // As we don't know which of the metadata or the content
+ // we'll hit first, catch the endDocument call initially
+ EndDocumentShieldingContentHandler handler = new
EndDocumentShieldingContentHandler(xhtml);
- try {
try {
- handleZipFile(zipFile, metadata, context, handler,
embeddedDocumentUtil);
- } finally {
- //Do we want to close silently == catch an exception here?
- if (tmpTis != null) {
- //tmpTis handles closing of the open zip container
- tmpTis.close();
+ try {
+ handleZipFile(zipFile, metadata, context, handler,
embeddedDocumentUtil);
+ } finally {
+ //Do we want to close silently == catch an exception here?
+ if (tmpTis != null) {
+ //tmpTis handles closing of the open zip container
+ tmpTis.close();
+ }
Review Comment:
OpenDocumentParser.parse() must not close the caller-provided InputStream
(Parser contract). Closing tmpTis here will close the wrapped `stream` because
TikaInputStream.close() adds the underlying stream to its TemporaryResources
and closes it. Since you’re already using `TikaInputStream.get(stream, tmp,
metadata)` + try-with-resources on `TemporaryResources`, you can rely on `tmp`
to clean up the temp file and the ZipFile (via setOpenContainer) without
closing the original InputStream; remove the explicit tmpTis.close() (and any
logic that depends on it).
> Avoid remaining temp files
> --------------------------
>
> Key: TIKA-4704
> URL: https://issues.apache.org/jira/browse/TIKA-4704
> Project: Tika
> Issue Type: Task
> Affects Versions: 3.3.0
> Reporter: Tilman Hausherr
> Priority: Minor
> Fix For: 4.0.0, 3.3.1
>
> Attachments: screenshot-1.png
>
>
> This is my temp directory after a successful build of tika 3. We should try
> to lessen this.
> !screenshot-1.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)