[ 
https://issues.apache.org/jira/browse/HADOOP-19901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083101#comment-18083101
 ] 

ASF GitHub Bot commented on HADOOP-19901:
-----------------------------------------

hadoop-yetus commented on PR #8511:
URL: https://github.com/apache/hadoop/pull/8511#issuecomment-4526354062

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 57s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  51m  8s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  24m  9s |  |  trunk passed with JDK 
Ubuntu-21.0.10+7-Ubuntu-124.04  |
   | +1 :green_heart: |  compile  |  29m 54s |  |  trunk passed with JDK 
Ubuntu-17.0.18+8-Ubuntu-124.04.1  |
   | +1 :green_heart: |  checkstyle  |   1m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m  2s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Ubuntu-21.0.10+7-Ubuntu-124.04  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  trunk passed with JDK 
Ubuntu-17.0.18+8-Ubuntu-124.04.1  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  37m  7s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 55s |  |  the patch passed with JDK 
Ubuntu-21.0.10+7-Ubuntu-124.04  |
   | +1 :green_heart: |  javac  |  16m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m  6s |  |  the patch passed with JDK 
Ubuntu-17.0.18+8-Ubuntu-124.04.1  |
   | +1 :green_heart: |  javac  |  18m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 58s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 51s |  |  the patch passed with JDK 
Ubuntu-21.0.10+7-Ubuntu-124.04  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  the patch passed with JDK 
Ubuntu-17.0.18+8-Ubuntu-124.04.1  |
   | +1 :green_heart: |  spotbugs  |   4m 45s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  47m 28s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  |  32m 26s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8511/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 282m 27s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.ipc.TestRPC |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.54 ServerAPI=1.54 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8511/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/8511 |
   | JIRA Issue | HADOOP-19901 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux cbd57bcce7e7 5.15.0-177-generic #187-Ubuntu SMP Sat Apr 11 
22:54:33 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 78376e959f8d524bd9622f3dc1fcbc288fb09677 |
   | Default Java | Ubuntu-17.0.18+8-Ubuntu-124.04.1 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 
/usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8511/1/testReport/ |
   | Max. process+thread count | 1312 (vs. ulimit of 10000) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8511/1/console |
   | versions | git=2.43.0 maven=3.9.15 spotbugs=4.9.7 |
   | Powered by | Apache Yetus 0.14.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> ChecksumFileSystem.readVectored leaks buffers allocated through caller's 
> IntFunction allocator
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-19901
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19901
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.5.0, 3.4.3
>            Reporter: Ismaël Mejía
>            Priority: Major
>              Labels: pull-request-available
>
> h3. Summary
> When {{ChecksumFileSystem.readVectored()}} is called with checksum 
> verification enabled (the default for {{LocalFileSystem}}), it allocates 
> buffers for *both* file data ranges and checksum ranges through the 
> caller-provided {{IntFunction<ByteBuffer> allocate}} function. However, the 
> checksum buffers are only used temporarily for verification and are never 
> released back to the caller. The caller has no reference to these buffers and 
> no mechanism to release them.
> This was discovered in Apache Parquet Java while upgrading from Hadoop 3.3.0 
> to 3.4.3 and testing with {{TrackingByteBufferAllocator}}, which detected 
> leaked {{ByteBuffer}} allocations.
> h3. Root cause
> In {{ChecksumFSInputChecker.readVectored()}} ([ChecksumFileSystem.java, 
> trunk|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java]):
> {code:java}
> @Override
> public void readVectored(final List<? extends FileRange> ranges,
>     final IntFunction<ByteBuffer> allocate,
>     final Consumer<ByteBuffer> release) throws IOException {
>   // ...
>   sums.readVectored(checksumRanges, allocate, release);   // allocates 
> checksum buffers via caller's allocator
>   datas.readVectored(dataRanges, allocate, release);       // allocates data 
> buffers via caller's allocator
>   for (CombinedFileRange checksumRange : checksumRanges) {
>     for (FileRange dataRange : checksumRange.getUnderlying()) {
>       CompletableFuture<ByteBuffer> result =
>           checksumRange.getData().thenCombineAsync(dataRange.getData(),
>               (sumBuffer, dataBuffer) ->
>                   checkBytes(sumBuffer, checksumRange.getOffset(),
>                       dataBuffer, dataRange.getOffset(), bytesPerSum, file));
>       for (FileRange original : ((CombinedFileRange) 
> dataRange).getUnderlying()) {
>         original.setData(result.thenApply(
>             (b) -> VectoredReadUtils.sliceTo(b, dataRange.getOffset(), 
> original)));
>       }
>     }
>   }
> }
> {code}
> Two problems:
> # *Checksum buffers are never released.* {{sums.readVectored(checksumRanges, 
> allocate, release)}} allocates buffers through the caller's {{allocate}} 
> function to read checksum data. After {{checkBytes()}} verifies the data, the 
> checksum buffers ({{sumBuffer}}) are no longer needed, but they are never 
> passed to {{release}} and are invisible to the caller. They leak.
> # *The 2-arg API provides no release mechanism.* The 2-arg overload passes a 
> no-op release:
>    {code:java}
>    public void readVectored(List<? extends FileRange> ranges,
>                             IntFunction<ByteBuffer> allocate) throws 
> IOException {
>        readVectored(ranges, allocate, (b) -> { });
>    }
>    {code}
>    Even callers using the 3-arg API don't benefit, because 
> {{ChecksumFileSystem}} itself never calls {{release}} on the checksum buffers 
> -- it only passes {{release}} down to the underlying streams.
> h3. How this was discovered
> Apache Parquet Java uses a {{TrackingByteBufferAllocator}} in tests that 
> wraps the real allocator and tracks all allocations. When the allocator is 
> closed, it throws {{LeakedByteBufferException}} if any allocated buffers were 
> not released. After upgrading Hadoop from 3.3.0 to 3.4.3, the following test 
> classes started failing with buffer leak errors in the vectored I/O path:
> * {{TestRecordLevelFilters}} (15 tests)
> * {{TestColumnIndexFiltering}} (24 tests) 
> * {{TestParquetReader}} (6+ tests)
> The allocation stacktrace showed:
> {code}
> TrackingByteBufferAllocator.allocate
>   -> VectorIOBufferPool.getBuffer
>     -> RawLocalFileSystem$AsyncHandler.initiateRead
> {code}
> Parquet's {{readVectored()}} method passes a {{ByteBufferAllocator}} to 
> Hadoop, but Hadoop uses it for internal temporary allocations (checksum 
> ranges) that are invisible to the caller.
> h3. Workaround in Parquet
> We implemented a "capturing allocator" pattern that wraps the allocator to 
> track all buffers allocated during {{readVectored()}}, then registers them 
> all for release:
> {code:java}
> List<ByteBuffer> allocatedBuffers = new ArrayList<>();
> ByteBufferAllocator capturingAllocator = new ByteBufferAllocator() {
>     @Override
>     public ByteBuffer allocate(int size) {
>         ByteBuffer buf = options.getAllocator().allocate(size);
>         allocatedBuffers.add(buf);
>         return buf;
>     }
>     // ...
> };
> try {
>     f.readVectored(ranges, capturingAllocator);
>     // ... process futures ...
> } finally {
>     builder.addBuffersToRelease(allocatedBuffers);
> }
> {code}
> This ensures all buffers allocated through the caller's allocator are 
> eventually released, regardless of whether they are returned in a future or 
> used internally by ChecksumFileSystem. See [parquet-java commit 
> fc0586d68|https://github.com/apache/parquet-java/commit/fc0586d68].
> h3. Suggested fixes
> *Option A (minimal): Release checksum buffers after verification.*
> In {{ChecksumFSInputChecker.readVectored()}}, after {{checkBytes()}} 
> completes, call {{release}} on the checksum buffer:
> {code:java}
> CompletableFuture<ByteBuffer> result =
>     checksumRange.getData().thenCombineAsync(dataRange.getData(),
>         (sumBuffer, dataBuffer) -> {
>             ByteBuffer verified = checkBytes(sumBuffer, 
> checksumRange.getOffset(),
>                 dataBuffer, dataRange.getOffset(), bytesPerSum, file);
>             release.accept(sumBuffer);  // release checksum buffer after 
> verification
>             return verified;
>         });
> {code}
> *Option B (comprehensive): Don't use the caller's allocator for internal 
> temporaries.*
> ChecksumFileSystem should allocate its own temporary buffers for checksum 
> data instead of using the caller-provided allocator. The caller's allocator 
> is intended for buffers that the caller will own and manage. Using it for 
> internal temporaries violates that expectation.
> {code:java}
> // Use internal allocation for checksums, not the caller's allocator
> sums.readVectored(checksumRanges, ByteBuffer::allocate, (b) -> { });
> // Only use caller's allocator for data ranges
> datas.readVectored(dataRanges, allocate, release);
> {code}
> *Option C (API improvement): Extend the API to support paired 
> allocate/release.*
> The current {{IntFunction<ByteBuffer>}} allocator is one-way -- there's no 
> way for Hadoop to release a buffer it allocated through the caller's 
> function. HADOOP-19303 added a {{Consumer<ByteBuffer> release}} parameter, 
> but it's separate from the allocate function and {{ChecksumFileSystem}} 
> doesn't use it for its own intermediate buffers. A paired allocator/releaser 
> interface (similar to Parquet's {{ByteBufferAllocator}} with both 
> {{allocate}} and {{release}} methods) would make the lifecycle explicit.
> h3. Related issues
> * *HADOOP-19303* (VectorIO API to support releasing buffers on failure) -- 
> Added the 3-arg {{readVectored}} with {{release}} Consumer, but 
> {{ChecksumFileSystem}} doesn't call {{release}} on checksum buffers.
> * *HADOOP-18296* (Memory fragmentation in ChecksumFileSystem Vectored IO) -- 
> Fixed range merging fragmentation, but did not address checksum buffer leaks.
> * *PARQUET-2171* (Implement vectored IO in parquet file format) -- The 
> Parquet side implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to