[
https://issues.apache.org/jira/browse/SPARK-56249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Neumann updated SPARK-56249:
------------------------------------
Labels: SPIP (was: )
> Auto CDC support
> ----------------
>
> Key: SPARK-56249
> URL: https://issues.apache.org/jira/browse/SPARK-56249
> Project: Spark
> Issue Type: Umbrella
> Components: Declarative Pipelines, PySpark, SQL
> Affects Versions: 4.2.0
> Reporter: Andreas Neumann
> Priority: Major
> Labels: SPIP
>
> *SPIP Document:* [SPIP: Auto
> CDC|https://docs.google.com/document/d/1Hp5BGEYJRHbk6J7XUph3bAPZKRQXKOuV1PEaqZMMRoQ/]
> *Motivation*
> With the upcoming introduction of standardized CDC support, Spark will soon
> have a unified way to _produce_ change data feeds. However, _consuming_ these
> feeds and applying them to a target table remains a significant challenge.
> Common patterns like *SCD Type 1* (maintaining a 1:1 replica) and *SCD Type
> 2* (tracking full change history) often require hand-crafted, complex MERGE
> logic. In distributed systems, these implementations are frequently
> error-prone when handling deletions or out-of-order data.
> *Proposal*
> This SPIP proposes a new *"Auto CDC"* flow type for Spark. It encapsulates
> the complex logic for SCD types and out-of-order data, allowing data
> engineers to configure a declarative flow instead of writing manual MERGE
> statements. This feature will be available in both {*}Python and SQL{*}.
> Example SQL:
> {code:java}
> -- Produce a change feed
> CREATE STREAMING TABLE cdc.users AS
> SELECT * FROM STREAM my_table CHANGES FROM VERSION 10;
> -- Consume the change feed
> CREATE FLOW flow
> AS AUTO CDC INTO
> target
> FROM stream(cdc_data.users)
> KEYS (userId)
> APPLY AS DELETE WHEN operation = "DELETE"
> SEQUENCE BY sequenceNum
> COLUMNS * EXCEPT (operation, sequenceNum)
> STORED AS SCD TYPE 2
> TRACK HISTORY ON * EXCEPT (city);
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]