[ 
https://issues.apache.org/jira/browse/SPARK-56249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Neumann updated SPARK-56249:
------------------------------------
    Labels: SPIP  (was: )

> Auto CDC support
> ----------------
>
>                 Key: SPARK-56249
>                 URL: https://issues.apache.org/jira/browse/SPARK-56249
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Declarative Pipelines, PySpark, SQL
>    Affects Versions: 4.2.0
>            Reporter: Andreas Neumann
>            Priority: Major
>              Labels: SPIP
>
> *SPIP Document:* [SPIP: Auto 
> CDC|https://docs.google.com/document/d/1Hp5BGEYJRHbk6J7XUph3bAPZKRQXKOuV1PEaqZMMRoQ/]
> *Motivation*
> With the upcoming introduction of standardized CDC support, Spark will soon 
> have a unified way to _produce_ change data feeds. However, _consuming_ these 
> feeds and applying them to a target table remains a significant challenge.
> Common patterns like *SCD Type 1* (maintaining a 1:1 replica) and *SCD Type 
> 2* (tracking full change history) often require hand-crafted, complex MERGE 
> logic. In distributed systems, these implementations are frequently 
> error-prone when handling deletions or out-of-order data.
> *Proposal*
> This SPIP proposes a new *"Auto CDC"* flow type for Spark. It encapsulates 
> the complex logic for SCD types and out-of-order data, allowing data 
> engineers to configure a declarative flow instead of writing manual MERGE 
> statements. This feature will be available in both {*}Python and SQL{*}.
> Example SQL:
> {code:java}
> -- Produce a change feed
> CREATE STREAMING TABLE cdc.users AS
> SELECT * FROM STREAM my_table CHANGES FROM VERSION 10;
> -- Consume the change feed
> CREATE FLOW flow
> AS AUTO CDC INTO
>   target
> FROM stream(cdc_data.users)
>   KEYS (userId)
>   APPLY AS DELETE WHEN operation = "DELETE"
>   SEQUENCE BY sequenceNum
>   COLUMNS * EXCEPT (operation, sequenceNum)
>   STORED AS SCD TYPE 2
>   TRACK HISTORY ON * EXCEPT (city);
>   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to