[
https://issues.apache.org/jira/browse/HADOOP-18296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18008067#comment-18008067
]
ASF GitHub Bot commented on HADOOP-18296:
-----------------------------------------
steveloughran commented on code in PR #7732:
URL: https://github.com/apache/hadoop/pull/7732#discussion_r2215986465
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java:
##########
@@ -619,4 +625,61 @@ protected <T extends Throwable> void
verifyExceptionalVectoredRead(
});
}
}
+
+ @Test
+ public void testBufferSlicing() throws Throwable {
+ describe("Test buffer slicing behavior in vectored IO");
+
+ final int numBuffers = 8;
+ final int bufferSize = S_4K;
+ long offset = 0;
+ final List<FileRange> fileRanges = new ArrayList<>();
+ for (int i = 0; i < numBuffers; i++) {
+ fileRanges.add(FileRange.createFileRange(offset, bufferSize));
+ // increment and add a non-binary-aligned gap, so as to force
+ // offsets to be misaligned with possible page sizes.
+ offset += bufferSize + 4000;
+ }
+ TrackingByteBufferPool trackerPool =
TrackingByteBufferPool.wrap(getPool());
+ int unknownBuffers = 0;
+ boolean slicing;
+ try (FSDataInputStream in = openVectorFile()) {
+ slicing = in.hasCapability(VECTOREDIO_BUFFERS_SLICED);
+ LOG.info("Slicing is {} for vectored IO with stream {}", slicing, in);
+ in.readVectored(fileRanges, s -> trackerPool.getBuffer(isDirect, s),
trackerPool::putBuffer);
+
+ // check that all buffers are from the the pool, unless they are sliced.
+ for (FileRange res : fileRanges) {
+ CompletableFuture<ByteBuffer> data = res.getData();
+ ByteBuffer buffer = awaitFuture(data);
+ Assertions.assertThat(buffer)
+ .describedAs("Buffer must not be null")
+ .isNotNull();
+ Assertions.assertThat(slicing || trackerPool.containsBuffer(buffer))
+ .describedAs("Buffer must be from the pool")
+ .isTrue();
+ try {
+ trackerPool.putBuffer(buffer);
+ } catch
(TrackingByteBufferPool.ReleasingUnallocatedByteBufferException e) {
+ // this can happen if the buffer was sliced, as it is not in the
pool.
+ if (!slicing) {
+ throw e;
+ }
+ LOG.info("Sliced buffer detected: {}", buffer);
+ unknownBuffers++;
+ }
+ }
+ }
+ try {
+ trackerPool.close();
+ } catch (TrackingByteBufferPool.LeakedByteBufferException e) {
+ if (!slicing) {
+ throw e;
+ }
+ LOG.info("Slicing is enabled; we saw leaked buffers: {} after {}"
Review Comment:
no, because is loggine when LeakedByteBufferException was caught. There's
nothing to reliably assert on; logging is all that we can do
> Memory fragmentation in ChecksumFileSystem Vectored IO implementation.
> ----------------------------------------------------------------------
>
> Key: HADOOP-18296
> URL: https://issues.apache.org/jira/browse/HADOOP-18296
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: common
> Affects Versions: 3.4.0
> Reporter: Mukund Thakur
> Assignee: Steve Loughran
> Priority: Minor
> Labels: fs, pull-request-available
>
> As we have implemented merging of ranges in the ChecksumFSInputChecker
> implementation of vectored IO api, it can lead to memory fragmentation. Let
> me explain by example.
>
> Suppose client requests for 3 ranges.
> 0-500, 700-1000 and 1200-1500.
> Now because of merging, all the above ranges will get merged into one and we
> will allocate a big byte buffer of 0-1500 size but return sliced byte buffers
> for the desired ranges.
> Now once the client is done reading all the ranges, it will only be able to
> free the memory for requested ranges and memory of the gaps will never be
> released for eg here (500-700 and 1000-1200).
>
> Note this only happens for direct byte buffers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]