This is an interesting project, +1 on the proposal and good luck to Datark!
Best Regards, - He Xiaoqiao On Fri, Sep 23, 2022 at 7:55 PM Willem Jiang <willem.ji...@gmail.com> wrote: > Hi Yu, > > Thanks for the explanation. Please add a rename plan to the project > proposal. > I'd be happy to be the mentor of this project. > > BTW, Could you update the Core Developers information with their > github id, it could be easy for us to track the contributions. > > > Willem Jiang > > Twitter: willemjiang > Weibo: 姜宁willem > > On Fri, Sep 23, 2022 at 5:41 PM Yu Li <car...@gmail.com> wrote: > > > > Hi Willem, > > > > Referring to the recent incubation process of streampark [1] and uniffle > > [2], it seems they didn't rename their original project names before > > entering apache incubator, thus we didn't plan to change the original > > github project name but would redirect it to the new project after > entering > > incubation. OTOH, if such a rename is necessary before incubation, we > will > > need some internal approval to process. Thanks. > > > > Best Regards, > > Yu > > > > [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3 > > [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h > > > > > > On Fri, 23 Sept 2022 at 09:07, Willem Jiang <ningji...@apache.org> > wrote: > > > > > I just checked the source repo, it is still using the name of > > > RemoteShuffleService. > > > Is there any plan for when we will change the project name? > > > > > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li <car...@gmail.com> wrote: > > > > > > > > Hi All, > > > > > > > > I would like to propose Datark [1] as a new apache incubator > project, and > > > > you can find the proposal [2] of Datark for more details. > > > > > > > > Datark is an intermediate (shuffle and spilled) data service for big > data > > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to > boost > > > > performance, stability, and flexibility. It aims at enabling > computing > > > > engines to fully embrace the disaggregated architecture. In a lot of > > > cases, > > > > intermediate data depends on large local disks, and is often a major > > > cause > > > > of inefficiency, instability, and inflexibility in the lifecycle of a > > > > distributed job. Datark solves the problems through the following > core > > > > designs: > > > > > > > > 1. Push-based shuffle plus partition data aggregation to turn random > IO > > > > access into sequential access. > > > > 2. FileSystem-like API to support writing spilled data. > > > > 3. Hierarchical storage from memory to DFS/object store to enable > fast > > > > cache and massive storage space. > > > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > > > 5. Extended fault tolerance and data replication to increase > reliability > > > > > > > > Datark is currently adopted in the production environment at both > Alibaba > > > > and many other companies, serving petabytes of data per day. Beyond > that, > > > > it has more open source users including Shopee, NetEase, Bilibily, > BOSS, > > > > and Synnex. Most of these users have made contributions to the > project, > > > > forming an active community with dozens of developers. > > > > > > > > The proposed initial committers are interested in joining ASF to > > > reinforce > > > > extensive collaboration and build a more vibrant community. We > believe > > > the > > > > Datark project will provide tremendous value for the community if it > is > > > > introduced into the Apache incubator. > > > > > > > > I will help this project as the champion and many thanks to our four > > > other > > > > mentors: > > > > > > > > * Becket Qin (j...@apache.org) > > > > * Duo Zhang (zhang...@apache.org) > > > > * Lidong Dai (lidong...@apache.org) > > > > * Willem Jiang (ningji...@apache.org) > > > > > > > > FWIW, although with different solutions, the issues Datark aims to > > > resolve > > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we > > > noticed > > > > this during the discussion phase of Uniffle incubation (when we were > also > > > > preparing for the incubation) and had some open and friendly > discussion > > > to > > > > see whether there could be a joint force [4], and finally decided to > > > > develop independently for the time being [5]. > > > > > > > > Look forward to your feedback. Thanks. > > > > > > > > Best Regards, > > > > Yu > > > > > > > > [1] https://github.com/alibaba/RemoteShuffleService > > > > [2] > https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > > > > [3] https://uniffle.apache.org/ > > > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > > > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >