snleee commented on code in PR #10045: URL: https://github.com/apache/pinot/pull/10045#discussion_r1062938976
########## pinot-connectors/pinot-flink-connector/src/main/java/org/apache/pinot/connector/flink/sink/PinotSinkFunction.java: ########## @@ -151,4 +148,18 @@ private void flush() LOG.info("Pinot segment uploaded to {}", segmentURI); }); } + + @Override + public List<GenericRow> snapshotState(long checkpointId, long timestamp) { Review Comment: @bobby-richard Very interesting use case 👍 In that case, in-memory solution may satisfy your requirements given that the rows that would flow to this flink job would be very small. RocksDB solution may requires extra complexity because we need to take care of state clean-up and we should also take care of failure cases (e.g. the node gets killed while we are cleaning up the rocks db etc). If the in-memory solution is good enough for you, we can also take the incremental approach (supporting in-memory mode for now and we can add the rocksdb mode for checkpoint storage). I also recommend you to add the time based trigger for `flush()` if it's not already there. Otherwise, the data may be kept in Flink side for a very long time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org