Very nice proposal! +1 from me (non-binding)
Regards, Yang Mehul Batra <[email protected]> 于2025年12月8日周一 13:59写道: > Hi all, > > Change Data Capture (CDC) is essential for real-time data processing, > widely used in scenarios like incremental ETL, data synchronization, and > audit logging. However, Fluss currently lacks a standardized mechanism to > expose changelog and binlog data through Flink SQL, forcing users to > implement custom change tracking solutions. > > Without native support, users face several challenges: > > - No SQL interface to query operation types (INSERT, UPDATE, DELETE) > - No efficient access to change data for incremental processing > - No consistent audit trails across applications > - No visibility into before/after states for UPDATE operations > > To address this, I'd like to propose FIP-20: Introduce $changelog and > $binlog Virtual Tables [1]. > > [1] > > https://cwiki.apache.org/confluence/display/FLUSS/FIP-20+Introduce+%24changelog+and+%24binlog+Virtual+Tables+in+Flink+Engine > > This proposal introduces: > > - *$changelog virtual table*: Flat schema with metadata columns ( > _change_type, _log_offset, _commit_timestamp) exposing change operations > (+I, -U, +U, -D for PK tables; +A for Log tables) > - *$binlog virtual table*: Nested schema with before/after ROW columns > providing Debezium-style CDC format with event-level semantics (I, U, D) > - *Schema introspection*: Support for DESCRIBE and SHOW CREATE TABLE on > virtual tables > - *Broad table support*: $changelog works for both Primary Key tables > and Log tables; $binlog for Primary Key tables only > > This follows the existing $lake virtual table pattern, ensuring consistency > across Fluss's virtual table design. > > Any feedback and suggestions are welcome! > > Best regards, > Mehul >
