walterddr commented on code in PR #10045:
URL: https://github.com/apache/pinot/pull/10045#discussion_r1063598717


##########
pinot-connectors/pinot-flink-connector/src/main/java/org/apache/pinot/connector/flink/sink/PinotSinkFunction.java:
##########
@@ -151,4 +148,18 @@ private void flush()
       LOG.info("Pinot segment uploaded to {}", segmentURI);
     });
   }
+
+  @Override
+  public List<GenericRow> snapshotState(long checkpointId, long timestamp) {

Review Comment:
   +1. I am not sure i fully follow the discussion here so let me try to 
clarify a background. 
   
   I think 
   - a general idea of the Pinot-Flink sink is designed to backfill data 
directly into Pinot; 
     - checkpointing serves the purpose of swiftly resuming a failed Flink job 
without having to restart from fresh. this is what most users currently do: 
they create multiple Flink jobs one responding for a specific time range; if 
job fails then user manually cleared up all the segments associated with that 
time range and restart the backfill again.
   - this PR attempts to enable checkpointing so that restarting the backfill 
job entirely is not necessary
     - with this in mind in-memory vs. rocksdb state backend definition comes 
into play in terms of OOM pressure, but I think that should be a judgment call 
on the user who operates the Flink job.
   
   but regardless, i think this PR serves the purpose well and thus i also 
agree we can follow up in separate issue/PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to