stevenzwu commented on code in PR #10208:
URL: https://github.com/apache/iceberg/pull/10208#discussion_r1576552272
##########
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java:
##########
@@ -201,8 +201,11 @@ private SplitEnumerator<IcebergSourceSplit,
IcebergEnumeratorState> createEnumer
return new ContinuousIcebergEnumerator(
enumContext, assigner, scanContext, splitPlanner, enumState);
} else {
- List<IcebergSourceSplit> splits =
planSplitsForBatch(planningThreadName());
- assigner.onDiscoveredSplits(splits);
+ if (enumState == null) {
Review Comment:
thanks for fixing the bug.
maybe add a comment like `Only do scan planning if nothing is restored from
checkpoint state`.
##########
flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceFailover.java:
##########
@@ -149,6 +163,15 @@ private void testBoundedIcebergSource(FailoverType
failoverType) throws Exceptio
JobClient jobClient = env.executeAsync("Bounded Iceberg Source Failover
Test");
JobID jobId = jobClient.getJobID();
+ Path jobCheckpointDir = checkpointDir.resolve(jobId.toString());
+ Awaitility.await("Wait for some checkpoints to complete")
Review Comment:
check for directory exist might be flaky. what if some subtask checkpointed
its state and uploaded state file to the checkpoint dir but the whole
checkpoint hasn't completed yet.
I can think of a couple potential alternatives
1. manually trigger checkpoint via `MiniCluster`
```
public CompletableFuture<String> triggerCheckpoint(JobID jobID) {
return runDispatcherCommand(
dispatcherGateway ->
dispatcherGateway.triggerCheckpoint(jobID, rpcTimeout));
}
```
2. check metrics like
`TestIcebergSourceContinuous#assertThatIcebergSourceMetricExists()`
##########
flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceFailover.java:
##########
@@ -40,24 +44,27 @@
import org.apache.iceberg.FileFormat;
import org.apache.iceberg.Schema;
import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
import org.apache.iceberg.data.GenericAppenderHelper;
import org.apache.iceberg.data.RandomGenericData;
import org.apache.iceberg.data.Record;
import org.apache.iceberg.flink.FlinkConfigOptions;
+import org.apache.iceberg.flink.FlinkReadOptions;
import org.apache.iceberg.flink.HadoopTableResource;
import org.apache.iceberg.flink.SimpleDataUtil;
import org.apache.iceberg.flink.TestFixtures;
import org.apache.iceberg.flink.sink.FlinkSink;
import org.apache.iceberg.flink.source.assigner.SimpleSplitAssignerFactory;
import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.awaitility.Awaitility;
import org.junit.ClassRule;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TemporaryFolder;
public class TestIcebergSourceFailover {
- private static final int PARALLELISM = 4;
+ private static final int PARALLELISM = 2;
Review Comment:
what's the reason of reducing parallelism?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]