Re: [DISCUSS] FIP-17: Streaming KV Scan RPC

Giannis Polyzos Wed, 29 Oct 2025 05:45:35 -0700

Yang, thank you for your thoughtful comments.

Indeed, we are streaming the results to the client; however, it's still a
batch operation. We could use "KV store (or PK table) Snapshot Query"  or
something similar, since we are querying a RocksDB snapshot. WDYT?
The newly introduced KvBatchScanner should be able to be reused from both
the client itself - assume a scenario that I want to periodically query the
full RocksDB KV store to power real-time dashboards - as well as Flink
(with more engines to follow later).
It issues requests to fetch the results per bucket and transmit them back
to the client.


> Could you elaborate on why the new KvBatchScanner isn't reusable?
I think the reasoning here is that reach requests create a new
KvBatchScanner, which polls the records and then closes automatically. Any
reason you see this as a limitation, and we should consider making it
reusable?

The design aims mainly for the Fluss client API.. Should we add an
integration design with Flink? Wang Cheng, WDYT?

Best,
Giannis



On Tue, Oct 28, 2025 at 4:44 AM Yang Wang <[email protected]> wrote:

> Hi Cheng,
>
> Thank you for driving this excellent work! Your FIP document shows great
> thought and initiative. I've gone through it and have some questions and
> suggestions that I hope can further enhance this valuable contribution.
>
> 1、Regarding the Title, I believe we could consider changing it to "Support
> full scan in batch mode for PrimaryKey Table". The term "Streaming" might
> cause confusion with Flink's streaming/batch modes, and this revised title
> would provide better clarity.
>
> 2、In the Motivation section, I think there are two particularly important
> benefits worth highlighting: (1) OLAP engines will be able to perform full
> snapshot reads on Fluss primary-key tables. (2) This approach can replace
> the current KvSnapshotBatchScanner, allowing the Fluss client to eliminate
> its RocksDB dependency entirely.
>
> 3、Concerning the Proposed Changes, could you clarify when exactly the
> client creates a KV snapshot on the server side, and when we send the
> bucket_scan_req?
>
> Let me share my thinking on this: When Flink attempts to read from a
> PrimaryKey table, the FlinkSourceEnumerator in the JobMaster generates
> HybridSnapshotLogSplit and dispatches them to SplitReaders running on the
> TaskManager. The JobMaster doesn't actually read data—it merely defines and
> manages the splits. Therefore, we need to ensure the JM has sufficient
> information to determine the boundary of the KV snapshot and the
> startOffset of the LogSplit.
>
> I suggest we explicitly create a snapshot (or as you've termed it, a
> new_scan_request) on the server side. This way, the FlinkSourceEnumerator
> can use it to define a HybridSnapshotLogSplit, and the SplitReaders can
> perform pollBatch operations on this snapshot (which would be bound to the
> specified scanner_id).
>
> 4、 Could you elaborate on why the new KvBatchScanner isn't reusable? What's
> the reasoning behind this limitation? (I believe RocksDB iterators do
> support the seekToFirst operation.) If a TaskManager fails over before a
> checkpoint, rescanning an existing snapshot seems like a natural
> requirement.
>
> 5、I think it would be beneficial to include some detailed design aspects
> regarding Flink's integration with the new BatchScanner.
>
> Overall, this is a solid foundation for an important enhancement. Looking
> forward to discussing these points further!
>
> Best regards, Yang
>
> Wang Cheng <[email protected]> 于2025年10月22日周三 17:09写道：
>
> > Hi all,
> >
> >
> > As of v0.8, Fluss only supports KV snapshot batch scan and limit KV batch
> > scan. The former approach is constrained by snapshot availability and
> > remote storage performance, while the later one is only applicable to
> > queries with LIMIT clause and risks high memory pressure.
> >
> >
> > To address those limitations, Giannis Polyzos and I are writing to
> propose
> > FIP-17: a general-purpose streaming KV scan for Fluss [1].
> >
> >
> > Any feedback and suggestions on this proposal are welcome!
> >
> >
> > [1]:
> >
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-17+Streaming+KV+Scan+RPC
> >
> > Regards,
> > Cheng
> >
> >
> >
> > &nbsp;
>

Re: [DISCUSS] FIP-17: Streaming KV Scan RPC

Reply via email to