RahmanQureshi commented on code in PR #4479:
URL: https://github.com/apache/iceberg/pull/4479#discussion_r1582397928


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestStructuredStreamingRead3.java:
##########
@@ -138,6 +143,57 @@ public void 
testReadStreamOnIcebergTableWithMultipleSnapshots() throws Exception
     
Assertions.assertThat(actual).containsExactlyInAnyOrderElementsOf(Iterables.concat(expected));
   }
 
+  @Test
+  public void 
testReadStreamOnIcebergTableWithMultipleSnapshots_WithNumberOfFiles_1()
+      throws Exception {
+    appendDataAsMultipleSnapshots(TEST_DATA_MULTIPLE_SNAPSHOTS);
+
+    Assert.assertEquals(
+        6,
+        microBatchCount(
+            
ImmutableMap.of(SparkReadOptions.STREAMING_MAX_FILES_PER_MICRO_BATCH, "1")));
+  }
+
+  @Test
+  public void 
testReadStreamOnIcebergTableWithMultipleSnapshots_WithNumberOfFiles_2()
+      throws Exception {
+    appendDataAsMultipleSnapshots(TEST_DATA_MULTIPLE_SNAPSHOTS);
+
+    Assert.assertEquals(
+        3,
+        microBatchCount(
+            
ImmutableMap.of(SparkReadOptions.STREAMING_MAX_FILES_PER_MICRO_BATCH, "2")));
+  }
+
+  @Test
+  public void 
testReadStreamOnIcebergTableWithMultipleSnapshots_WithNumberOfRows_1()
+      throws Exception {
+    appendDataAsMultipleSnapshots(TEST_DATA_MULTIPLE_SNAPSHOTS);
+
+    // only 1 micro-batch will be formed and we will read data partially
+    Assert.assertEquals(
+        1,

Review Comment:
   Ohh, I see! The next file has 2 records, and the rate limiting isn't 
sophisticated enough to break a file into multiple microbatches (which makes 
sense).
   
   But I am curious why the first list of records gets broken into two files 
deterministically. The first file has simple-record-1, and the second file has 
simple-record-2 and simple-record-3.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to