+1
2026年4月7日(火) 16:17 Peter Toth <[email protected]>: > +1 (non-binding) > > On Mon, Apr 6, 2026 at 10:13 PM Jungtaek Lim <[email protected]> > wrote: > >> +1 (non-binding) >> >> On Tue, Apr 7, 2026 at 3:11 AM Liu Cao <[email protected]> wrote: >> >>> +1 (non-binding) >>> >>> On Mon, Apr 6, 2026 at 11:10 AM John Zhuge <[email protected]> wrote: >>> >>>> +1 (non-binding) >>>> >>>> On Mon, Apr 6, 2026 at 11:07 AM Holden Karau <[email protected]> >>>> wrote: >>>> >>>>> +1 (binding) >>>>> >>>>> On Mon, Apr 6, 2026 at 10:58 AM Anish Shrigondekar via dev < >>>>> [email protected]> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> Thanks, >>>>>> Anish >>>>>> >>>>>> On Mon, Apr 6, 2026 at 10:57 AM DB Tsai <[email protected]> wrote: >>>>>> >>>>>>> +1 (binding) >>>>>>> >>>>>>> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >>>>>>> >>>>>>> On Apr 3, 2026, at 2:59 PM, Andreas Neumann <[email protected]> wrote: >>>>>>> >>>>>>> Hi Spark devs, >>>>>>> >>>>>>> I'd like to call a vote on the SPIP*: Auto CDC Support for Apache >>>>>>> Spark* >>>>>>> Motivation >>>>>>> >>>>>>> With the upcoming introduction of standardized CDC support >>>>>>> <https://issues.apache.org/jira/browse/SPARK-55668>, Spark will >>>>>>> soon have a unified way to produce change data feeds. However, >>>>>>> consuming these feeds and applying them to a target table remains a >>>>>>> significant challenge. >>>>>>> >>>>>>> Common patterns like SCD Type 1 (maintaining a 1:1 replica) and SCD >>>>>>> Type 2 (tracking full change history) often require hand-crafted, >>>>>>> complex MERGE logic. In distributed systems, these implementations >>>>>>> are frequently error-prone when handling deletions or out-of-order data. >>>>>>> Proposal >>>>>>> >>>>>>> This SPIP proposes a new "Auto CDC" flow type for Spark. It >>>>>>> encapsulates the complex logic for SCD types and out-of-order data, >>>>>>> allowing data engineers to configure a declarative flow instead of >>>>>>> writing >>>>>>> manual MERGE statements. This feature will be available in both Python >>>>>>> and SQL. >>>>>>> Example SQL: >>>>>>> -- Produce a change feed >>>>>>> CREATE STREAMING TABLE cdc.users AS >>>>>>> SELECT * FROM STREAM my_table CHANGES FROM VERSION 10; >>>>>>> >>>>>>> -- Consume the change feed >>>>>>> CREATE FLOW flow >>>>>>> AS AUTO CDC INTO >>>>>>> target >>>>>>> FROM stream(cdc_data.users) >>>>>>> KEYS (userId) >>>>>>> APPLY AS DELETE WHEN operation = "DELETE" >>>>>>> SEQUENCE BY sequenceNum >>>>>>> COLUMNS * EXCEPT (operation, sequenceNum) >>>>>>> STORED AS SCD TYPE 2 >>>>>>> TRACK HISTORY ON * EXCEPT (city); >>>>>>> >>>>>>> *Relevant Links:* >>>>>>> >>>>>>> - SPIP Document: >>>>>>> >>>>>>> https://docs.google.com/document/d/1Hp5BGEYJRHbk6J7XUph3bAPZKRQXKOuV1PEaqZMMRoQ/ >>>>>>> - *Discussion Thread: * >>>>>>> https://lists.apache.org/thread/j6sj9wo9odgdpgzlxtvhoy7szs0jplf7 >>>>>>> - >>>>>>> >>>>>>> JIRA: <https://issues.apache.org/jira/browse/SPARK-55668> >>>>>>> https://issues.apache.org/jira/browse/SPARK-56249 >>>>>>> >>>>>>> *The vote will be open for at least 72 hours. *Please vote: >>>>>>> >>>>>>> [ ] +1: Accept the proposal as an official SPIP >>>>>>> [ ] +0 >>>>>>> [ ] -1: I don't think this is a good idea because ... >>>>>>> Cheers -Andreas >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>> Pronouns: she/her >>>>> >>>> >>>> >>>> -- >>>> John Zhuge >>>> >>> >>> >>> -- >>> >>> Liu Cao >>> >>> >>>
