ajantha-bhat commented on code in PR #9953: URL: https://github.com/apache/iceberg/pull/9953#discussion_r1524376106
########## aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIO.java: ########## @@ -377,6 +384,50 @@ public void testResolvingFileIOLoad() { Assertions.assertThat(result).isInstanceOf(S3FileIO.class); } + @Test + public void testInputFileWithDataFile() throws IOException { + String location = "s3://bucket/path/to/data-file.parquet"; + DataFile dataFile = + DataFiles.builder(PartitionSpec.unpartitioned()) + .withPath(location) + .withFileSizeInBytes(123L) + .withFormat(FileFormat.PARQUET) + .withRecordCount(123L) + .build(); + OutputStream outputStream = s3FileIO.newOutputFile(location).create(); + byte[] data = "testing".getBytes(); + outputStream.write(data); + outputStream.close(); + + InputFile inputFile = s3FileIO.newInputFile(dataFile); + Assertions.assertThat(inputFile.getLength()) + .as("Data file length should be determined from the file size stats") + .isEqualTo(123L); + } + + @Test + public void testInputFileWithManifest() throws IOException { + String dataFileLocation = "s3://bucket/path/to/data-file-2.parquet"; + DataFile dataFile = + DataFiles.builder(PartitionSpec.unpartitioned()) + .withPath(dataFileLocation) + .withFileSizeInBytes(123L) + .withFormat(FileFormat.PARQUET) + .withRecordCount(123L) + .build(); + String manifestLocation = "s3://bucket/path/to/manifest.avro"; + OutputFile outputFile = s3FileIO.newOutputFile(manifestLocation); + ManifestWriter<DataFile> writer = + ManifestFiles.write(PartitionSpec.unpartitioned(), outputFile); + writer.add(dataFile); + writer.close(); + ManifestFile manifest = writer.toManifestFile(); + InputFile inputFile = s3FileIO.newInputFile(manifest); + Assertions.assertThat(inputFile.getLength()) + .as("Manifest file length should be determined from the file size stats") + .isEqualTo(manifest.length()); Review Comment: minor: This will not confirm whether `manifest.length` was used or file size is computed by reading the file. As the both values can be same. The above `dataFileTestcase` validates right. Maybe we have mock the `manifest.length` to confirm that value is used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org