[
https://issues.apache.org/jira/browse/HADOOP-18107?focusedWorklogId=770888&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-770888
]
ASF GitHub Bot logged work on HADOOP-18107:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 16/May/22 15:39
Start Date: 16/May/22 15:39
Worklog Time Spent: 10m
Work Description: steveloughran commented on code in PR #4273:
URL: https://github.com/apache/hadoop/pull/4273#discussion_r873874607
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/ContractTestUtils.java:
##########
@@ -1095,6 +1102,54 @@ public static void validateFileContent(byte[] concat,
byte[][] bytes) {
mismatch);
}
+ /**
+ * Utility to validate vectored read results.
+ * @param fileRanges input ranges.
+ * @param originalData original data.
+ * @throws IOException any ioe.
+ */
+ public static void validateVectoredReadResult(List<FileRange> fileRanges,
+ byte[] originalData)
+ throws IOException, TimeoutException {
+ CompletableFuture<?>[] completableFutures = new
CompletableFuture<?>[fileRanges.size()];
+ int i = 0;
+ for (FileRange res : fileRanges) {
+ completableFutures[i++] = res.getData();
+ }
+ CompletableFuture<Void> combinedFuture =
CompletableFuture.allOf(completableFutures);
+ FutureIO.awaitFuture(combinedFuture, 5, TimeUnit.MINUTES);
+
+ for (FileRange res : fileRanges) {
+ CompletableFuture<ByteBuffer> data = res.getData();
+ ByteBuffer buffer = FutureIO.awaitFuture(data, 5, TimeUnit.MINUTES);
+ assertDatasetEquals((int) res.getOffset(), "vecRead",
+ buffer, res.getLength(), originalData);
+ }
+ }
+
+
+ /**
+ * Assert that the data read matches the dataset at the given offset.
+ * This helps verify that the seek process is moving the read pointer
+ * to the correct location in the file.
+ * @param readOffset the offset in the file where the read began.
+ * @param operation operation name for the assertion.
+ * @param data data read in.
+ * @param length length of data to check.
+ * @param originalData original data.
+ */
+ public static void assertDatasetEquals(
+ final int readOffset, final String operation,
Review Comment:
nit, split line
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/AbstractSTestS3AHugeFiles.java:
##########
@@ -446,6 +455,35 @@ public void test_040_PositionedReadHugeFile() throws
Throwable {
toHuman(timer.nanosPerOperation(ops)));
}
+ @Test
+ public void test_045_vectoredIOHugeFile() throws Throwable {
+ assumeHugeFileExists();
+ List<FileRange> rangeList = new ArrayList<>();
+ rangeList.add(new FileRangeImpl(5856368, 1167716));
+ rangeList.add(new FileRangeImpl(3520861, 1167700));
+ rangeList.add(new FileRangeImpl(8191913, 1167775));
+ rangeList.add(new FileRangeImpl(1520861, 1167700));
+ rangeList.add(new FileRangeImpl(2520861, 116770));
+ rangeList.add(new FileRangeImpl(9191913, 116770));
+ rangeList.add(new FileRangeImpl(2820861, 156770));
+ IntFunction<ByteBuffer> allocate = new IntFunction<ByteBuffer>() {
Review Comment:
can't you just add a lambda expression here, in deed, passing in
`ByteBuffer::allocate` should work
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/ContractTestUtils.java:
##########
@@ -1095,6 +1102,54 @@ public static void validateFileContent(byte[] concat,
byte[][] bytes) {
mismatch);
}
+ /**
+ * Utility to validate vectored read results.
+ * @param fileRanges input ranges.
+ * @param originalData original data.
+ * @throws IOException any ioe.
+ */
+ public static void validateVectoredReadResult(List<FileRange> fileRanges,
+ byte[] originalData)
+ throws IOException, TimeoutException {
+ CompletableFuture<?>[] completableFutures = new
CompletableFuture<?>[fileRanges.size()];
+ int i = 0;
+ for (FileRange res : fileRanges) {
+ completableFutures[i++] = res.getData();
+ }
+ CompletableFuture<Void> combinedFuture =
CompletableFuture.allOf(completableFutures);
+ FutureIO.awaitFuture(combinedFuture, 5, TimeUnit.MINUTES);
+
+ for (FileRange res : fileRanges) {
+ CompletableFuture<ByteBuffer> data = res.getData();
+ ByteBuffer buffer = FutureIO.awaitFuture(data, 5, TimeUnit.MINUTES);
Review Comment:
we shouldn't have to wait at all, if the allOf completed.
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/ContractTestUtils.java:
##########
@@ -1095,6 +1102,54 @@ public static void validateFileContent(byte[] concat,
byte[][] bytes) {
mismatch);
}
+ /**
+ * Utility to validate vectored read results.
+ * @param fileRanges input ranges.
+ * @param originalData original data.
+ * @throws IOException any ioe.
+ */
+ public static void validateVectoredReadResult(List<FileRange> fileRanges,
+ byte[] originalData)
+ throws IOException, TimeoutException {
+ CompletableFuture<?>[] completableFutures = new
CompletableFuture<?>[fileRanges.size()];
+ int i = 0;
+ for (FileRange res : fileRanges) {
+ completableFutures[i++] = res.getData();
+ }
+ CompletableFuture<Void> combinedFuture =
CompletableFuture.allOf(completableFutures);
+ FutureIO.awaitFuture(combinedFuture, 5, TimeUnit.MINUTES);
Review Comment:
the timeout should be a constant, ideally a unit in seconds in case we ever
want to make it much smaller
Issue Time Tracking
-------------------
Worklog Id: (was: 770888)
Time Spent: 2h (was: 1h 50m)
> Vectored IO support for large S3 files.
> ----------------------------------------
>
> Key: HADOOP-18107
> URL: https://issues.apache.org/jira/browse/HADOOP-18107
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Mukund Thakur
> Assignee: Mukund Thakur
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h
> Remaining Estimate: 0h
>
> This effort would mostly be adding more tests for large files under scale
> tests and see if any new issue surfaces.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]