Re: [VOTE] SPIP: Auto CDC Support for Apache Spark

Kousuke Saruta Tue, 07 Apr 2026 00:18:56 -0700

+1


2026年4月7日(火) 16:17 Peter Toth <[email protected]>:

> +1 (non-binding)
>
> On Mon, Apr 6, 2026 at 10:13 PM Jungtaek Lim <[email protected]>
> wrote:
>
>> +1 (non-binding)
>>
>> On Tue, Apr 7, 2026 at 3:11 AM Liu Cao <[email protected]> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Mon, Apr 6, 2026 at 11:10 AM John Zhuge <[email protected]> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> On Mon, Apr 6, 2026 at 11:07 AM Holden Karau <[email protected]>
>>>> wrote:
>>>>
>>>>> +1 (binding)
>>>>>
>>>>> On Mon, Apr 6, 2026 at 10:58 AM Anish Shrigondekar via dev <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> Thanks,
>>>>>> Anish
>>>>>>
>>>>>> On Mon, Apr 6, 2026 at 10:57 AM DB Tsai <[email protected]> wrote:
>>>>>>
>>>>>>> +1 (binding)
>>>>>>>
>>>>>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>>>>>
>>>>>>> On Apr 3, 2026, at 2:59 PM, Andreas Neumann <[email protected]> wrote:
>>>>>>>
>>>>>>> Hi Spark devs,
>>>>>>>
>>>>>>> I'd like to call a vote on the SPIP*: Auto CDC Support for Apache
>>>>>>> Spark*
>>>>>>> Motivation
>>>>>>>
>>>>>>> With the upcoming introduction of standardized CDC support
>>>>>>> <https://issues.apache.org/jira/browse/SPARK-55668>, Spark will
>>>>>>> soon have a unified way to produce change data feeds. However,
>>>>>>> consuming these feeds and applying them to a target table remains a
>>>>>>> significant challenge.
>>>>>>>
>>>>>>> Common patterns like SCD Type 1 (maintaining a 1:1 replica) and SCD
>>>>>>> Type 2 (tracking full change history) often require hand-crafted,
>>>>>>> complex MERGE logic. In distributed systems, these implementations
>>>>>>> are frequently error-prone when handling deletions or out-of-order data.
>>>>>>> Proposal
>>>>>>>
>>>>>>> This SPIP proposes a new "Auto CDC" flow type for Spark. It
>>>>>>> encapsulates the complex logic for SCD types and out-of-order data,
>>>>>>> allowing data engineers to configure a declarative flow instead of 
>>>>>>> writing
>>>>>>> manual MERGE statements. This feature will be available in both Python
>>>>>>> and SQL.
>>>>>>> Example SQL:
>>>>>>> -- Produce a change feed
>>>>>>> CREATE STREAMING TABLE cdc.users AS
>>>>>>> SELECT * FROM STREAM my_table CHANGES FROM VERSION 10;
>>>>>>>
>>>>>>> -- Consume the change feed
>>>>>>> CREATE FLOW flow
>>>>>>> AS AUTO CDC INTO
>>>>>>>   target
>>>>>>> FROM stream(cdc_data.users)
>>>>>>>   KEYS (userId)
>>>>>>>   APPLY AS DELETE WHEN operation = "DELETE"
>>>>>>>   SEQUENCE BY sequenceNum
>>>>>>>   COLUMNS * EXCEPT (operation, sequenceNum)
>>>>>>>   STORED AS SCD TYPE 2
>>>>>>>   TRACK HISTORY ON * EXCEPT (city);
>>>>>>>
>>>>>>> *Relevant Links:*
>>>>>>>
>>>>>>>    - SPIP Document:
>>>>>>>    
>>>>>>> https://docs.google.com/document/d/1Hp5BGEYJRHbk6J7XUph3bAPZKRQXKOuV1PEaqZMMRoQ/
>>>>>>>    - *Discussion Thread: *
>>>>>>>    https://lists.apache.org/thread/j6sj9wo9odgdpgzlxtvhoy7szs0jplf7
>>>>>>>    -
>>>>>>>
>>>>>>>    JIRA: <https://issues.apache.org/jira/browse/SPARK-55668>
>>>>>>>    https://issues.apache.org/jira/browse/SPARK-56249
>>>>>>>
>>>>>>> *The vote will be open for at least 72 hours. *Please vote:
>>>>>>>
>>>>>>> [ ] +1: Accept the proposal as an official SPIP
>>>>>>> [ ] +0
>>>>>>> [ ] -1: I don't think this is a good idea because ...
>>>>>>> Cheers -Andreas
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>> Pronouns: she/her
>>>>>
>>>>
>>>>
>>>> --
>>>> John Zhuge
>>>>
>>>
>>>
>>> --
>>>
>>> Liu Cao
>>>
>>>
>>>

Re: [VOTE] SPIP: Auto CDC Support for Apache Spark

Reply via email to