walterddr commented on code in PR #10045: URL: https://github.com/apache/pinot/pull/10045#discussion_r1063598717
########## pinot-connectors/pinot-flink-connector/src/main/java/org/apache/pinot/connector/flink/sink/PinotSinkFunction.java: ########## @@ -151,4 +148,18 @@ private void flush() LOG.info("Pinot segment uploaded to {}", segmentURI); }); } + + @Override + public List<GenericRow> snapshotState(long checkpointId, long timestamp) { Review Comment: +1. I am not sure i fully follow the discussion here so let me try to clarify a background. I think - a general idea of the Pinot-Flink sink is designed to backfill data directly into Pinot; - checkpointing serves the purpose of swiftly resuming a failed Flink job without having to restart from fresh. - restart from fresh is what most users currently do: they create multiple Flink jobs one responding for a specific time range; if job fails then user manually cleared up all the segments associated with that time range and restart the backfill again. - this PR attempts to enable checkpointing so that restarting the backfill job entirely is not necessary - with this in mind in-memory vs. rocksdb state backend definition comes into play in terms of OOM pressure, but I think that should be a judgment call on the user who operates the Flink job. but regardless, i think this PR serves the purpose well and thus i also agree we can follow up in separate issue/PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org