+1

Dr Mich Talebzadeh,
Data Scientist | Distributed Systems (Spark) | Financial Forensics &
Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based
Analytics

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Fri, 3 Apr 2026 at 23:00, Andreas Neumann <[email protected]> wrote:

> Hi Spark devs,
>
> I'd like to call a vote on the SPIP*: Auto CDC Support for Apache Spark*
> Motivation
>
> With the upcoming introduction of standardized CDC support
> <https://issues.apache.org/jira/browse/SPARK-55668>, Spark will soon have
> a unified way to produce change data feeds. However, consuming these
> feeds and applying them to a target table remains a significant challenge.
>
> Common patterns like SCD Type 1 (maintaining a 1:1 replica) and SCD Type 2 
> (tracking
> full change history) often require hand-crafted, complex MERGE logic. In
> distributed systems, these implementations are frequently error-prone when
> handling deletions or out-of-order data.
> Proposal
>
> This SPIP proposes a new "Auto CDC" flow type for Spark. It encapsulates
> the complex logic for SCD types and out-of-order data, allowing data
> engineers to configure a declarative flow instead of writing manual MERGE 
> statements.
> This feature will be available in both Python and SQL.
>
> Example SQL:
>
> -- Produce a change feed
>
> CREATE STREAMING TABLE cdc.users AS
>
> SELECT * FROM STREAM my_table CHANGES FROM VERSION 10;
>
>
> -- Consume the change feed
>
> CREATE FLOW flow
>
> AS AUTO CDC INTO
>
>   target
>
> FROM stream(cdc_data.users)
>
>   KEYS (userId)
>
>   APPLY AS DELETE WHEN operation = "DELETE"
>
>   SEQUENCE BY sequenceNum
>
>   COLUMNS * EXCEPT (operation, sequenceNum)
>
>   STORED AS SCD TYPE 2
>
>   TRACK HISTORY ON * EXCEPT (city);
>
>
> *Relevant Links:*
>
>    - SPIP Document:
>    
> https://docs.google.com/document/d/1Hp5BGEYJRHbk6J7XUph3bAPZKRQXKOuV1PEaqZMMRoQ/
>    -
>
>    *Discussion Thread: *
>    https://lists.apache.org/thread/j6sj9wo9odgdpgzlxtvhoy7szs0jplf7
>    -
>
>    JIRA: <https://issues.apache.org/jira/browse/SPARK-55668>
>    https://issues.apache.org/jira/browse/SPARK-56249
>
> *The vote will be open for at least 72 hours. *Please vote:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don't think this is a good idea because ...
> Cheers -Andreas
>
>
>

Reply via email to