[GitHub] [pinot] snleee commented on a diff in pull request #10045: add checkpointing support to flink connector

GitBox Thu, 05 Jan 2023 14:13:33 -0800


snleee commented on code in PR #10045:
URL: https://github.com/apache/pinot/pull/10045#discussion_r1062938976



##########
pinot-connectors/pinot-flink-connector/src/main/java/org/apache/pinot/connector/flink/sink/PinotSinkFunction.java:
##########
@@ -151,4 +148,18 @@ private void flush()
       LOG.info("Pinot segment uploaded to {}", segmentURI);
     });
   }
+
+  @Override
+  public List<GenericRow> snapshotState(long checkpointId, long timestamp) {

Review Comment:
   @bobby-richard Very interesting use case 👍 In that case, in-memory solution 
may satisfy your requirements given that the rows that would flow to this flink 
job would be very small.
   
   RocksDB solution may requires extra complexity because we need to take care 
of state clean-up and we should also take care of failure cases (e.g. the node 
gets killed while we are cleaning up the rocks db etc).
   
   We can also take the incremental approach (supporting in-memory mode for now 
and we can add the rocksdb mode for checkpoint storage).
   
   I also recommend you to add the time based trigger for `flush()` if it's not 
already there. Otherwise, the data may be kept in Flink side for a very long 
time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] snleee commented on a diff in pull request #10045: add checkpointing support to flink connector

Reply via email to