RahmanQureshi commented on code in PR #4479: URL: https://github.com/apache/iceberg/pull/4479#discussion_r1582397928
########## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestStructuredStreamingRead3.java: ########## @@ -138,6 +143,57 @@ public void testReadStreamOnIcebergTableWithMultipleSnapshots() throws Exception Assertions.assertThat(actual).containsExactlyInAnyOrderElementsOf(Iterables.concat(expected)); } + @Test + public void testReadStreamOnIcebergTableWithMultipleSnapshots_WithNumberOfFiles_1() + throws Exception { + appendDataAsMultipleSnapshots(TEST_DATA_MULTIPLE_SNAPSHOTS); + + Assert.assertEquals( + 6, + microBatchCount( + ImmutableMap.of(SparkReadOptions.STREAMING_MAX_FILES_PER_MICRO_BATCH, "1"))); + } + + @Test + public void testReadStreamOnIcebergTableWithMultipleSnapshots_WithNumberOfFiles_2() + throws Exception { + appendDataAsMultipleSnapshots(TEST_DATA_MULTIPLE_SNAPSHOTS); + + Assert.assertEquals( + 3, + microBatchCount( + ImmutableMap.of(SparkReadOptions.STREAMING_MAX_FILES_PER_MICRO_BATCH, "2"))); + } + + @Test + public void testReadStreamOnIcebergTableWithMultipleSnapshots_WithNumberOfRows_1() + throws Exception { + appendDataAsMultipleSnapshots(TEST_DATA_MULTIPLE_SNAPSHOTS); + + // only 1 micro-batch will be formed and we will read data partially + Assert.assertEquals( + 1, Review Comment: Ohh, I see! The next file has 2 records, and the rate limiting isn't sophisticated enough to break a file into multiple microbatches (which makes sense). But I am curious why the first list of records gets broken into two files deterministically. The first file has simple-record-1, and the second file has simple-record-2 and simple-record-3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org