OK, but the first association that came to my mind was SNAFU - perhaps because it shares the last 3 letters with DataFu.
I just thought you ought to be aware that the name could have negative connotations for some people. On 19 December 2013 00:16, Matthew Hayes <mha...@linkedin.com> wrote: > When we came up with the name a couple years ago, it was inspired by "kung > fu", in a playful way as Roman mentioned. Sort of like saying your Java Fu > or Python Fu is excellent. > > -Matt > ________________________________________ > From: sebb [seb...@gmail.com] > Sent: Wednesday, December 18, 2013 3:57 PM > To: general@incubator.apache.org > Subject: Re: [PROPOSAL] DataFu for Incubation > > On 18 December 2013 22:49, Matthew Hayes <mha...@linkedin.com> wrote: >> Hi all, >> >> I would like to share our draft ASF incubation proposal for DataFu, a >> library that makes it easier to solve data problems in Hadoop and high level >> languages based on it. > > I am the only person to think that the last part of the name has > unfortunate connotations? > c.f. SNAFU which has the same last two characters. > >> The proposal can be found here: >> >> https://wiki.apache.org/incubator/DataFuProposal >> >> The source code is available on GitHub: >> >> https://github.com/linkedin/datafu. >> >> The text of the proposal is copied below. Feedback is appreciated! >> >> Thanks, >> Matt >> >> == Abstract == >> >> Data``Fu makes it easier to solve data problems using Hadoop and higher >> level languages based on it. >> >> == Proposal == >> >> Data``Fu provides a collection of Hadoop Map``Reduce jobs and functions in >> higher level languages based on it to perform data analysis. It provides >> functions for common statistics tasks (e.g. quantiles, sampling), >> Page``Rank, stream sessionization, and set and bag operations. Data``Fu >> also provides Hadoop jobs for incremental data processing in Map``Reduce. >> >> == Background == >> >> Data``Fu began two years ago as set of UDFs developed internally at >> Linked``In, coming from our desire to solve common problems with reusable >> components. Recognizing that the community could benefit from such a >> library, we added documentation, an extensive suite of unit tests, and open >> sourced the code. Since then there have been steady contributions to >> Data``Fu as we encountered common problems not yet solved by it. Others >> outside Linked``In have contributed as well. More recently we recognized >> the challenges with efficient incremental processing of data in Hadoop and >> have contributed a set of Hadoop Map``Reduce jobs as a solution. >> >> Data``Fu began as a project at Linked``In, but it has shown itself to be >> useful to other organizations and developers as well as they have faced >> similar problems. We would like to share Data``Fu with the ASF and begin >> developing a community of developers and users within Apache. >> >> == Rationale == >> >> There is a strong need for well tested libraries that help developers solve >> common data problems in Hadoop and higher level languages such as Pig, Hive, >> Crunch, Scalding, etc. >> >> == Current Status == >> >> === Meritocracy === >> >> Our intent with this incubator proposal is to start building a diverse >> developer community around Data``Fu following the Apache meritocracy model. >> Since Data``Fu was initially open sourced in 2011, it has received >> contributions from both within and outside Linked``In. We plan to continue >> support for new contributors and work with those who contribute >> significantly to the project to make them committers. >> >> === Community === >> >> Data``Fu has been building a community of developers for two years. It >> began with contributors from Linked``In and has received contributions from >> developers at Cloudera since very early on. It has been included included >> in Cloudera’s Hadoop Distribution and Apache Bigtop. We hope to extend our >> contributor base significantly and invite all those who are interested in >> solving large-scale data processing problems to participate. >> >> === Core Developers === >> >> Data``Fu has a strong base of developers at Linked``In. Matthew Hayes >> initiated the project in 2011, and aside from continued contributions to >> Data``Fu has also contributed the sub-project Hourglass for incremental >> Map``Reduce processing. Separate from Data``Fu he has also open sourced the >> White Elephant project. Sam Shah contributed a significant portion of the >> original code and continues to contribute to the project. William Vaughan >> has been contributing regularly to Data``Fu for the past two years. Evion >> Kim has been contributing to Data``Fu for the past year. Xiangrui Meng >> recently contributed implementations of scalable sampling algorithms based >> on research from a paper he published. Chris Lloyd has provided some >> important bug fixes and unit tests. Mitul Tiwari has also contributed to >> Data``Fu. Mathieu Bastian has been developing Map``Reduce jobs that we hope >> to include in Data``Fu. In addition he also leads the open source Gephi >> project. >> >> === Alignment === >> >> The ASF is the natural choice to host the Data``Fu project as its goal of >> encouraging community-driven open-source projects fits with our vision for >> Data``Fu. Additionally, other projects Data``Fu integrates with, such as >> Apache Pig and Apache Hadoop, and in the future Apache Hive and Apache >> Crunch, are hosted by the ASF and we will benefit and provide benefit by >> close proximity to them. >> >> == Known Risks == >> >> === Orphaned Products === >> >> The core developers have been contributing to Data``Fu for the past two >> years. There is very little risk of Data``Fu being abandoned given its >> widespread use within Linked``In. >> >> === Inexperience with Open Source === >> >> Data``Fu was started as an open source project in 2011 and has remained so >> for two years. Matt initiated the project, and additionally is the creator >> of the open source White Elephant project. He has also contributed patches >> to Apache Pig. Most recently he has released Hourglass as a sub-project of >> Data``Fu. Sam contributed much of the original code and continues to >> contribute to the project. Will has been contributing to Data``Fu since it >> was first open sourced. Evion has been contributing for the past year. >> Mathieu leads the open source Gephi project. Jakob has been actively >> involved with the ASF as a full-time Hadoop committer and PMC member. >> >> === Homogeneous Developers === >> >> The current core developers are all from Linked``In. Data``Fu has also >> received contributions from other corporations such as Cloudera. Two of >> these developers are among the Initial Committers listed below. We hope to >> establish a developer community that includes contributors from several >> other corporations and we are actively encouraging new contributors via >> presentations and blog posts. >> >> === Reliance on Salaried Developers === >> >> The current core developers are salaried employees of Linked``In, however >> they are not paid specifically to work on Data``Fu. Contributions to >> Data``Fu arise from the developers solving problems they encounter in their >> various projects. The purpose of Data``Fu is to share these solutions so >> that others may benefit and build a community of developers striving to >> solve common problems together. Furthermore, once the project has a >> community built around it, we expect to get committers, developers and >> contributions from outside the current core developers. >> >> === Relationships with Other Apache Products === >> >> Data``Fu is deeply integrated with Apache products. It began as a library >> of user-defined functions for Apache Pig. It has grown to also include >> Hadoop jobs for incremental data processing and in the future will include >> code for other higher level languages built on top of Apache Hadoop. >> >> === An Excessive Obsession with the Apache Brand === >> >> While we respect the reputation of the Apache brand and have no doubts that >> it will attract contributors and users, our interest is primarily to give >> Data``Fu a solid home as an open source project following an established >> development model. >> >> == Documentation == >> >> Information on Data``Fu can be found at: >> >> https://github.com/LinkedIn/DataFu/blob/master/README.md >> >> == Initial Source == >> >> The initial source is available at: >> >> https://github.com/LinkedIn/DataFu >> >> == Source and Intellectual Property Submission Plan == >> >> * The Data``Fu library source code, available on Git``Hub. >> >> == External Dependencies == >> >> The initial source has the following external dependencies that are either >> included in the final Data``Fu library or required in order to use it: >> >> * fastutil (Apache 2.0) >> * joda-time (Apache 2.0) >> * commons-math (Apache 2.0) >> * guava (Apache 2.0) >> * stream (Apache 2.0) >> * jsr-305 (BSD) >> * log4j (Apache 2.0) >> * json (The JSON License) >> * avro (Apache 2.0) >> >> In addition, the following external libraries are used either in building, >> developing, or testing the project: >> >> * pig (Apache 2.0) >> * hadoop (Apache 2.0) >> * jline (BSD) >> * antlr (BSD) >> * commons-io (Apache 2.0) >> * testng (Apache 2.0) >> * maven (Apache 2.0) >> * jsr-311 (CDDL-1.0) >> * slf4j (MIT) >> * eclipse (Eclipse Public License 1.0) >> * autojar (GPLv2) >> * jarjar (Apache 2.0) >> >> == Cryptography == >> >> Data``Fu has user-defined functions that use MD5 and SHA provided by Java’s >> java.security.Message``Digest. >> >> == Required Resources == >> >> === Mailing Lists === >> >> Data``Fu-private for private PMC discussions (with moderated subscriptions) >> Data``Fu-dev Data``Fu-commits >> >> === Subversion Directory === >> >> Git is the preferred source control system: git://git.apache.org/DataFu >> >> === Issue Tracking === >> >> JIRA Data``Fu (Data``Fu) >> >> === Other Resources === >> >> The existing code already has unit tests, so we would like a Hudson instance >> to run them whenever a new patch is submitted. This can be added after >> project creation. >> >> == Initial Committers == >> >> * Matthew Hayes >> * William Vaughan >> * Evion Kim >> * Sam Shah >> * Xiangrui Meng >> * Christopher Lloyd >> * Mathieu Bastian >> * Mitul Tiwari >> * Josh Wills >> * Jarek Jarcec Cecho >> >> == Affiliations == >> >> * Matthew Hayes (Linked``In) >> * William Vaughan (Linked``In) >> * Evion Kim (Linked``In) >> * Sam Shah (Linked``In) >> * Xiangrui Meng (Linked``In) >> * Christopher Lloyd (Linked``In) >> * Mathieu Bastian (Linked``In) >> * Mitul Tiwari (Linked``In) >> * Josh Wills (Cloudera) >> * Jarek Jarcec Cecho (Cloudera) >> >> == Sponsors == >> >> === Champion === >> >> Jakob Homan (Apache Member) >> >> === Nominated Mentors === >> >> * Ashutosh Chauhan <hashutosh at apache dot org> >> * Roman Shaposhnik <rvs at apache dot org> >> * Ted Dunning <tdunning at apache dot org> >> >> === Sponsoring Entity === >> >> We are requesting the Incubator to sponsor this project. >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org