This is an interesting project, +1 on the proposal and good luck to Datark!

Best Regards,
- He Xiaoqiao

On Fri, Sep 23, 2022 at 7:55 PM Willem Jiang <willem.ji...@gmail.com> wrote:

> Hi Yu,
>
> Thanks for the explanation. Please add a rename plan to the project
> proposal.
> I'd be happy to be the mentor of this project.
>
> BTW,  Could you update the Core Developers information with their
> github id,  it could be easy for us to track the contributions.
>
>
> Willem Jiang
>
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> On Fri, Sep 23, 2022 at 5:41 PM Yu Li <car...@gmail.com> wrote:
> >
> > Hi Willem,
> >
> > Referring to the recent incubation process of streampark [1] and uniffle
> > [2], it seems they didn't rename their original project names before
> > entering apache incubator, thus we didn't plan to change the original
> > github project name but would redirect it to the new project after
> entering
> > incubation. OTOH, if such a rename is necessary before incubation, we
> will
> > need some internal approval to process. Thanks.
> >
> > Best Regards,
> > Yu
> >
> > [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3
> > [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h
> >
> >
> > On Fri, 23 Sept 2022 at 09:07, Willem Jiang <ningji...@apache.org>
> wrote:
> >
> > > I just checked the source repo, it is still using the name of
> > > RemoteShuffleService.
> > > Is there any plan for when we will change the project name?
> > >
> > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li <car...@gmail.com> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I would like to propose Datark [1] as a new apache incubator
> project, and
> > > > you can find the proposal [2] of Datark for more details.
> > > >
> > > > Datark is an intermediate (shuffle and spilled) data service for big
> data
> > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to
> boost
> > > > performance, stability, and flexibility. It aims at enabling
> computing
> > > > engines to fully embrace the disaggregated architecture. In a lot of
> > > cases,
> > > > intermediate data depends on large local disks, and is often a major
> > > cause
> > > > of inefficiency, instability, and inflexibility in the lifecycle of a
> > > > distributed job. Datark solves the problems through the following
> core
> > > > designs:
> > > >
> > > > 1. Push-based shuffle plus partition data aggregation to turn random
> IO
> > > > access into sequential access.
> > > > 2. FileSystem-like API to support writing spilled data.
> > > > 3. Hierarchical storage from memory to DFS/object store to enable
> fast
> > > > cache and massive storage space.
> > > > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > > > 5. Extended fault tolerance and data replication to increase
> reliability
> > > >
> > > > Datark is currently adopted in the production environment at both
> Alibaba
> > > > and many other companies, serving petabytes of data per day. Beyond
> that,
> > > > it has more open source users including Shopee, NetEase, Bilibily,
> BOSS,
> > > > and Synnex. Most of these users have made contributions to the
> project,
> > > > forming an active community with dozens of developers.
> > > >
> > > > The proposed initial committers are interested in joining ASF to
> > > reinforce
> > > > extensive collaboration and build a more vibrant community. We
> believe
> > > the
> > > > Datark project will provide tremendous value for the community if it
> is
> > > > introduced into the Apache incubator.
> > > >
> > > > I will help this project as the champion and many thanks to our four
> > > other
> > > > mentors:
> > > >
> > > > * Becket Qin (j...@apache.org)
> > > > * Duo Zhang (zhang...@apache.org)
> > > > * Lidong Dai (lidong...@apache.org)
> > > > * Willem Jiang (ningji...@apache.org)
> > > >
> > > > FWIW, although with different solutions, the issues Datark aims to
> > > resolve
> > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> > > noticed
> > > > this during the discussion phase of Uniffle incubation (when we were
> also
> > > > preparing for the incubation) and had some open and friendly
> discussion
> > > to
> > > > see whether there could be a joint force [4], and finally decided to
> > > > develop independently for the time being [5].
> > > >
> > > > Look forward to your feedback. Thanks.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > > [1] https://github.com/alibaba/RemoteShuffleService
> > > > [2]
> https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > > > [3] https://uniffle.apache.org/
> > > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Reply via email to