[jira] [Commented] (HADOOP-18296) Memory fragmentation in ChecksumFileSystem Vectored IO implementation.

ASF GitHub Bot (Jira) Thu, 17 Jul 2025 02:28:07 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18007758#comment-18007758
 ]


ASF GitHub Bot commented on HADOOP-18296:
-----------------------------------------

steveloughran commented on code in PR #7732:
URL: https://github.com/apache/hadoop/pull/7732#discussion_r2212828881


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/TrackingByteBufferPool.java:
##########
@@ -0,0 +1,288 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.impl;
+
+import java.nio.ByteBuffer;
+import java.util.IdentityHashMap;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.io.ByteBufferPool;
+
+import static java.lang.System.identityHashCode;
+import static java.util.Objects.requireNonNull;
+
+/**
+ * A wrapper {@link ByteBufferPool} implementation that tracks whether all 
allocated buffers
+ * are released.
+ * <p>
+ * It throws the related exception at {@link #close()} if any buffer remains 
un-released.
+ * It also clears the buffers at release so if they continued being used it'll 
generate errors.
+ * <p>
+ * To be used for testing only.
+ * <p>
+ * The stacktraces of the allocation are not stored by default because
+ * it can significantly decrease the unit test performance.
+ * Configuring this class to log at DEBUG will trigger their collection.
+ * @see ByteBufferAllocationStacktraceException
+ * <p>
+ * Adapted from Parquet class {@code 
org.apache.parquet.bytes.TrackingByteBufferAllocator}.
+ */
+public final class TrackingByteBufferPool implements ByteBufferPool, 
AutoCloseable {

Review Comment:
   fs.impl is all
   ```
   @InterfaceAudience.LimitedPrivate("Filesystems")
   @InterfaceStability.Unstable
   ```
   I think that is enough, isnt it?





> Memory fragmentation in ChecksumFileSystem Vectored IO implementation.
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-18296
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18296
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: common
>    Affects Versions: 3.4.0
>            Reporter: Mukund Thakur
>            Assignee: Steve Loughran
>            Priority: Minor
>              Labels: fs, pull-request-available
>
> As we have implemented merging of ranges in the ChecksumFSInputChecker 
> implementation of vectored IO api, it can lead to memory fragmentation. Let 
> me explain by example.
>  
> Suppose client requests for 3 ranges. 
> 0-500, 700-1000 and 1200-1500.
> Now because of merging, all the above ranges will get merged into one and we 
> will allocate a big byte buffer of 0-1500 size but return sliced byte buffers 
> for the desired ranges.
> Now once the client is done reading all the ranges, it will only be able to 
> free the memory for requested ranges and memory of the gaps will never be 
> released for eg here (500-700 and 1000-1200).
>  
> Note this only happens for direct byte buffers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18296) Memory fragmentation in ChecksumFileSystem Vectored IO implementation.

Reply via email to