[ 
https://issues.apache.org/jira/browse/TIKA-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070124#comment-18070124
 ] 

ASF GitHub Bot commented on TIKA-4704:
--------------------------------------

Copilot commented on code in PR #2728:
URL: https://github.com/apache/tika/pull/2728#discussion_r3017899281


##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -152,7 +152,7 @@ protected static String 
getStringFromInputStream(InputStream in) throws IOExcept
     public static InputStream gzip(InputStream is) throws IOException {
         ByteArrayOutputStream bos = new ByteArrayOutputStream();
         OutputStream gz = new GzipCompressorOutputStream(bos);
-        IOUtils.copy(is, gz);
+        is.transferTo(gz);
         gz.flush();
         gz.close();
         return new ByteArrayInputStream(bos.toByteArray());

Review Comment:
   `gzip(InputStream)` creates a `GzipCompressorOutputStream` and closes it 
only on the happy path. If `transferTo()` throws, the gzip stream won’t be 
closed, which can leave the compressor in an inconsistent state and leak 
resources. Use try-with-resources (or a finally) to guarantee the gzip stream 
is closed even on exceptions.



##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -201,7 +202,7 @@ public void setUp() throws Exception {
 
             // Initialize PipesParsingHelper for pipes-based parsing
             // Merge the fetcher config with basePath pointing to the temp 
directory
-            this.pipesConfigPath = mergeFetcherConfig(this.pipesConfigPath, 
inputTempDirectory);
+            this.pipesConfigPath = mergeFetcherConfig(tmpPipesConfigPath, 
inputTempDirectory);
             TikaJsonConfig tikaJsonConfig = 
TikaJsonConfig.load(this.pipesConfigPath);

Review Comment:
   In `setUp()`, `tmpPipesConfigPath` is always a temp file created by 
`mergePassbackAllStrategy()`/`createDefaultTestConfig()`, but it is never 
deleted after `mergeFetcherConfig(...)` writes a new temp config. This leaves 
an intermediate temp file behind for each test run. Please delete 
`tmpPipesConfigPath` once the merged config has been written (ideally in a 
finally that won’t delete the final `pipesConfigPath`).



##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -268,6 +271,8 @@ private Path mergePassbackAllStrategy(InputStream 
pipesConfigInputStream) throws
     /**
      * Merges the tika-server-fetcher configuration into the pipes config.
      * The fetcher is configured with basePath pointing to the input temp 
directory.
+     * 
+     * @return new (different) JSON configPath in the temp director

Review Comment:
   Javadoc typo: "temp director" should be "temp directory", and the `@return` 
sentence should end with a period.
   ```suggestion
        * @return new (different) JSON configPath in the temp directory.
   ```



##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -186,12 +186,13 @@ public void setUp() throws Exception {
             Files.copy(getTikaConfigInputStream(), tmp, 
StandardCopyOption.REPLACE_EXISTING);
 
             InputStream pipesConfigInputStream = getPipesConfigInputStream();
+            Path tmpPipesConfigPath;
             if (pipesConfigInputStream != null) {
                 // Test provided its own pipes config - merge in PASSBACK_ALL 
emit strategy
-                this.pipesConfigPath = 
mergePassbackAllStrategy(pipesConfigInputStream);
+                tmpPipesConfigPath = 
mergePassbackAllStrategy(pipesConfigInputStream);
             } else {
                 // Create a default pipes config, merging metadata-filters 
from tika config
-                this.pipesConfigPath = createDefaultTestConfig(tmp);
+                tmpPipesConfigPath = createDefaultTestConfig(tmp);
             }

Review Comment:
   `pipesConfigInputStream` is obtained from `getPipesConfigInputStream()` and 
passed into `mergePassbackAllStrategy(...)` without being closed. Even though 
the current base implementation returns a `ByteArrayInputStream`, subclasses 
can override this to return a stream that requires closing. Wrap 
`pipesConfigInputStream` in try-with-resources when non-null.



##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -452,9 +459,8 @@ protected Map<String, byte[]> 
readZipArchiveBytes(InputStream inputStream) throw
                 Enumeration<ZipArchiveEntry> entries = zip.getEntries();
                 while (entries.hasMoreElements()) {
                     ZipArchiveEntry entry = entries.nextElement();
-                    ByteArrayOutputStream bos = new ByteArrayOutputStream();
-                    IOUtils.copy(zip.getInputStream(entry), bos);
-                    data.put(entry.getName(), bos.toByteArray());
+                    byte[] bytes = zip.getInputStream(entry).readAllBytes();
+                    data.put(entry.getName(), bytes);
                 }

Review Comment:
   In `readZipArchiveBytes(...)`, the per-entry stream returned by 
`zip.getInputStream(entry)` is not closed. Wrap that stream in 
try-with-resources before calling `readAllBytes()` to avoid leaking file 
descriptors.



##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -429,9 +437,8 @@ protected Map<String, String> readZipArchive(InputStream 
inputStream) throws IOE
                 Enumeration<ZipArchiveEntry> entries = zip.getEntries();
                 while (entries.hasMoreElements()) {
                     ZipArchiveEntry entry = entries.nextElement();
-                    ByteArrayOutputStream bos = new ByteArrayOutputStream();
-                    IOUtils.copy(zip.getInputStream(entry), bos);
-                    data.put(entry.getName(), 
DigestUtils.md5Hex(bos.toByteArray()));
+                    byte[] bytes = zip.getInputStream(entry).readAllBytes();
+                    data.put(entry.getName(), DigestUtils.md5Hex(bytes));
                 }

Review Comment:
   In `readZipArchive(...)`, the per-entry stream returned by 
`zip.getInputStream(entry)` is not closed. This can leak file handles across 
many entries. Wrap the entry stream in try-with-resources before calling 
`readAllBytes()`.





> Avoid remaining temp files
> --------------------------
>
>                 Key: TIKA-4704
>                 URL: https://issues.apache.org/jira/browse/TIKA-4704
>             Project: Tika
>          Issue Type: Task
>    Affects Versions: 3.3.0
>            Reporter: Tilman Hausherr
>            Priority: Minor
>             Fix For: 4.0.0, 3.3.1
>
>         Attachments: screenshot-1.png
>
>
> This is my temp directory after a successful build of tika 3. We should try 
> to lessen this.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to