walterddr commented on code in PR #10045:
URL: https://github.com/apache/pinot/pull/10045#discussion_r1063598717


##########
pinot-connectors/pinot-flink-connector/src/main/java/org/apache/pinot/connector/flink/sink/PinotSinkFunction.java:
##########
@@ -151,4 +148,18 @@ private void flush()
       LOG.info("Pinot segment uploaded to {}", segmentURI);
     });
   }
+
+  @Override
+  public List<GenericRow> snapshotState(long checkpointId, long timestamp) {

Review Comment:
   +1. I am not sure i fully follow the discussion here so let me try to 
clarify a background. 
   
   I think 
   - a general idea of the Pinot-Flink sink is designed to backfill data 
directly into Pinot; 
     - checkpointing serves the purpose of swiftly resuming a failed Flink job 
without having to restart from fresh. 
     - restart from fresh is what most users currently do: they create multiple 
Flink jobs one responding for a specific time range; if job fails then user 
manually cleared up all the segments associated with that time range and 
restart the backfill again.
   - this PR attempts to enable checkpointing so that restarting the backfill 
job entirely is not necessary
     - with this in mind in-memory vs. rocksdb state backend definition comes 
into play in terms of OOM pressure, but I think that should be a judgment call 
on the user who operates the Flink job.
   
   but regardless, i think this PR serves the purpose well and thus i also 
agree we can follow up in separate issue/PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to