Done. Sorry for that oversight. Kenn
On Mon, Feb 25, 2019 at 10:01 PM lee...@gmail.com <lee...@gmail.com> wrote: > As primary author, can I be given the ability to directly edit? > On 2019/02/26 05:37:22, Kenneth Knowles <k...@apache.org> wrote: > > It isn't too much work, so I've done it: > > https://s.apache.org/datasketches-proposal-draft > > > > Kenn > > > > On Mon, Feb 25, 2019 at 9:31 PM leerho <lee...@gmail.com> wrote: > > > > > Yes, I thought of that. But it’s not like I’m being overwhelmed with > > > requests to comment ... so far it has been only 3 or 4, and the > requested > > > changes have been minor. I’m assuming that if there are no more > > > substantive changes after this week that the document would be moved > to the > > > wiki archive, where, I presume, changes could still be made. > > > > > > I want to do the right thing here, so if you feel that the document > would > > > get much better feedback on an unrestricted gDoc site, I will set it > up. > > > > > > > > > > > > On Mon, Feb 25, 2019 at 8:32 PM Jim Apple <jbap...@cloudera.com.invalid > > > > > wrote: > > > > > > > You could use a Google account that is not under Yahoo’s control, > then > > > let > > > > anyone in the world add a comment, maybe. > > > > > > > > On Mon, Feb 25, 2019 at 3:26 PM leerho <lee...@gmail.com> wrote: > > > > > > > > > Ken, > > > > > Yahoo does not allow me to create a shared link outside our > company, > > > > except > > > > > to individual email addresses. So attempting to share it to the > email > > > > > general@incubator.apache.org may not work. Nonetheless, several > > > > > individuals were able to request access using their individual > email > > > > > accounts and I was able to add them. I will try to add you using > > > > > k...@apache.org, but if that doesn't work, I may need a gmail or > > > > > equivalent > > > > > account for you. > > > > > > > > > > Lee. > > > > > > > > > > > > > > > On Mon, Feb 25, 2019 at 2:59 PM Kenneth Knowles <k...@apache.org> > > > wrote: > > > > > > > > > > > I could not access that document. I suggest you need to turn on > link > > > > > > sharing. > > > > > > > > > > > > Kenn > > > > > > > > > > > > On Mon, Feb 25, 2019 at 12:00 PM lee...@gmail.com < > lee...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Try this link: > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/19JKevzFQNcaLA51LFLUlP1hzdFDW7oDJrJO8N6weDv8/edit?usp=sharing > > > > > > > > > > > > > > > > > > > > > On 2019/02/25 05:55:50, leerho <lee...@gmail.com> wrote: > > > > > > > > Yes I will try that tomorrow. > > > > > > > > > > > > > > > > On Sun, Feb 24, 2019 at 7:34 PM Kenneth Knowles < > k...@apache.org > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Can you share the Google doc with the proposal? Per Ted's > > > advice, > > > > > we > > > > > > > can > > > > > > > > > iterate quickly there and move it to the wiki when it > becomes a > > > > bit > > > > > > > more > > > > > > > > > stable. > > > > > > > > > > > > > > > > > > Kenn > > > > > > > > > > > > > > > > > > On Fri, Feb 22, 2019 at 10:21 PM lee...@gmail.com < > > > > > lee...@gmail.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for the offer. i am a neophyte at this process > and > > > > email > > > > > > > app! I > > > > > > > > > > could use a lot of help getting this off the ground! > Also, > > > I'm > > > > > not > > > > > > > sure > > > > > > > > > > that Mr. Chen and Mr. Onofré have fully accepted taking > this > > > on > > > > > :) > > > > > > > > > > > > > > > > > > > > Lee. > > > > > > > > > > > > > > > > > > > > On 2019/02/23 06:03:58, Kenneth Knowles <k...@apache.org > > > > > > wrote: > > > > > > > > > > > Nice. > > > > > > > > > > > > > > > > > > > > > > I would very much like to help mentor this project, > though > > > > you > > > > > > > already > > > > > > > > > > have > > > > > > > > > > > a couple good ones. > > > > > > > > > > > > > > > > > > > > > > I concur with incubator as sponsoring entity. > > > > > > > > > > > > > > > > > > > > > > Kenn (VP Apache Beam) > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 22, 2019 at 9:45 PM leerho < > lee...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > I didn't realize that this mail list does not accept > PDF > > > > > files, > > > > > > > > > > apparently > > > > > > > > > > > > only text. So let me try one more time ... :) > Please > > > let > > > > me > > > > > > > know if > > > > > > > > > > > > this works! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > = Apache DataSketches Proposal[1] = > > > > > > > > > > > > > > > > > > > > > > > > == Abstract == > > > > > > > > > > > > > > > > > > > > > > > > DataSketches.GitHub.io is an open source, > > > high-performance > > > > > > > library > > > > > > > > > of > > > > > > > > > > > > stochastic streaming algorithms commonly called > > > "sketches" > > > > in > > > > > > the > > > > > > > > > data > > > > > > > > > > > > sciences. Sketches are small, stateful programs that > > > > process > > > > > > > massive > > > > > > > > > > data > > > > > > > > > > > > as a stream and can provide approximate answers, with > > > > > > > mathematical > > > > > > > > > > > > guarantees, to computationally difficult queries > > > > > > > orders-of-magnitude > > > > > > > > > > faster > > > > > > > > > > > > than traditional, exact methods. > > > > > > > > > > > > > > > > > > > > > > > > This proposal is to move DataSketches to the Apache > > > > Software > > > > > > > > > > > > Foundation(ASF) transferring ownership of its > copyright > > > > > > > intellectual > > > > > > > > > > > > property to the ASF. Thereafter, DataSketches would > be > > > > > > > officially > > > > > > > > > > known as > > > > > > > > > > > > Apache DataSketches and its evolution and governance > > > would > > > > > come > > > > > > > under > > > > > > > > > > the > > > > > > > > > > > > rules and guidance of the ASF. > > > > > > > > > > > > > > > > > > > > > > > > == Introduction == > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches library contains carefully crafted > > > > > > > implementations > > > > > > > > > of > > > > > > > > > > > > sketch algorithms that meet rigorous standards of > quality > > > > and > > > > > > > > > > performance > > > > > > > > > > > > and provide capabilities required for large-scale > > > > production > > > > > > > systems > > > > > > > > > > that > > > > > > > > > > > > must process and analyze massive data. The > DataSketches > > > > core > > > > > > > > > > repository is > > > > > > > > > > > > written in Java with a parallel core repository > written > > > in > > > > > C++ > > > > > > > that > > > > > > > > > > > > includes Python wrappers. The DataSketches library > also > > > > > > includes > > > > > > > > > > special > > > > > > > > > > > > repositories for extending the core library for > Apache > > > Hive > > > > > and > > > > > > > > > Apache > > > > > > > > > > Pig. > > > > > > > > > > > > The sketches developed in the different languages > share a > > > > > > common > > > > > > > > > binary > > > > > > > > > > > > storage format so that sketches created and stored in > > > Java, > > > > > for > > > > > > > > > > example, > > > > > > > > > > > > can be fully used in C++, and visa versa. Because > the > > > > stored > > > > > > > sketch > > > > > > > > > > > > "images" are just a "blob" of bytes (similar to > picture > > > > > > images), > > > > > > > they > > > > > > > > > > can > > > > > > > > > > > > be shared across many different systems, languages > and > > > > > > platforms. > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches documentation website, > > > > > > > > > https://datasketches.github.io > > > > > > > > > > , > > > > > > > > > > > > includes general tutorials, a comprehensive research > > > > section > > > > > > with > > > > > > > > > > > > references to relevant academic papers, extensive > > > examples > > > > > for > > > > > > > using > > > > > > > > > > the > > > > > > > > > > > > core library directly as well as examples for > accessing > > > the > > > > > > > library > > > > > > > > > in > > > > > > > > > > > > Hive, Pig, and Apache Spark. > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches library also includes a > characterization > > > > > > > repository > > > > > > > > > > for > > > > > > > > > > > > long running test programs that are used for studying > > > > > accuracy > > > > > > > and > > > > > > > > > > > > performance of these sketches over wide ranges of > input > > > > > > > variables. > > > > > > > > > The > > > > > > > > > > data > > > > > > > > > > > > produced by these programs is used for generating the > > > many > > > > > > > > > performance > > > > > > > > > > > > plots contained in the documentation website and for > > > > academic > > > > > > > > > > > > publications. > > > > > > > > > > > > > > > > > > > > > > > > The code repositories used for production are > versioned > > > and > > > > > > > published > > > > > > > > > > to > > > > > > > > > > > > Maven Central on periodic intervals as the library > > > evolves. > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches library also includes several > > > experimental > > > > > > > > > > repositories > > > > > > > > > > > > for use-cases outside the large-scale systems > > > environments, > > > > > > such > > > > > > > as > > > > > > > > > > > > sketches for mobile, IoT devices (Android), > command-line > > > > > access > > > > > > > of > > > > > > > > > the > > > > > > > > > > > > sketch library, and an experimental repository for > > > > > vector-based > > > > > > > > > > sketches > > > > > > > > > > > > that performs approximate Singular Value > Decomposition > > > > (SVD) > > > > > > > analysis > > > > > > > > > > that > > > > > > > > > > > > could potentially be used in Machine Learning (ML) > > > > > > applications. > > > > > > > > > > > > > > > > > > > > > > > > == Background == > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches library was started in 2012 as > internal > > > > > Yahoo > > > > > > > > > project > > > > > > > > > > to > > > > > > > > > > > > dramatically reduce time and resources required for > > > > distinct > > > > > > > (unique) > > > > > > > > > > > > counting. An extensive search on the Internet at the > > > time > > > > > > > yielded a > > > > > > > > > > number > > > > > > > > > > > > of theoretical papers on stochastic streaming > algorithms > > > > with > > > > > > > > > > pseudocode > > > > > > > > > > > > examples, but we did not find any usable open-source > code > > > > of > > > > > > the > > > > > > > > > > quality we > > > > > > > > > > > > felt we needed for our internal production systems. > So > > > we > > > > > > > started a > > > > > > > > > > small > > > > > > > > > > > > project (one person) to develop our own sketches > working > > > > > > directly > > > > > > > > > from > > > > > > > > > > > > published theoretical papers. > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches library was designed from the start > with > > > > the > > > > > > > > > > objective of > > > > > > > > > > > > making these algorithms, usually only described in > > > > > theoretical > > > > > > > > > papers, > > > > > > > > > > > > easily accessible to systems developers for use in > our > > > > > internal > > > > > > > > > > production > > > > > > > > > > > > systems. By necessity, the code had to be of the > highest > > > > > > quality > > > > > > > and > > > > > > > > > > > > thoroughly tested. The wide variety of our internal > > > > > production > > > > > > > > > systems > > > > > > > > > > > > drove the requirement that the sketch > implementations had > > > > to > > > > > > > have an > > > > > > > > > > > > absolute minimum of external, run-time dependencies > in > > > > order > > > > > to > > > > > > > > > > simplify > > > > > > > > > > > > integration and troubleshooting. > > > > > > > > > > > > > > > > > > > > > > > > Our internal experiments demonstrated dramatic > positive > > > > > impact > > > > > > > on the > > > > > > > > > > > > performance of our systems. As a result, the > > > DataSketches > > > > > > > library > > > > > > > > > > quickly > > > > > > > > > > > > evolved to include different types of sketches for > > > > different > > > > > > > types of > > > > > > > > > > > > queries, such as frequent-items (a.k.a, > heavy-hitters) > > > > > > > algorithms, > > > > > > > > > > > > quantile/histogram algorithms, and weighted and > > > unweighted > > > > > > > sampling > > > > > > > > > > > > algorithms. > > > > > > > > > > > > > > > > > > > > > > > > We quickly discovered that developing these sketch > > > > algorithms > > > > > > to > > > > > > > be > > > > > > > > > > truly > > > > > > > > > > > > robust in production environments is quite difficult > and > > > > > > requires > > > > > > > > > deep > > > > > > > > > > > > understanding of the underlying mathematics and > > > statistics > > > > as > > > > > > > well as > > > > > > > > > > > > extensive experience in developing high quality code > for > > > > 24/7 > > > > > > > > > > production > > > > > > > > > > > > systems. This is a difficult combination of skills > for > > > any > > > > > one > > > > > > > > > > organization > > > > > > > > > > > > to collect and maintain over time. It became clear > that > > > > this > > > > > > > > > technology > > > > > > > > > > > > needed a community larger than Yahoo to evolve. In > > > > November, > > > > > > > 2015, > > > > > > > > > > this > > > > > > > > > > > > factor, along with Yahoo’s strong experience and > support > > > of > > > > > > open > > > > > > > > > > source, > > > > > > > > > > > > led to the decision to open source this technology > under > > > an > > > > > > > Apache > > > > > > > > > 2.0 > > > > > > > > > > > > license on GitHub. Since that time our community has > > > > expanded > > > > > > > > > > considerably > > > > > > > > > > > > and the key contributors to this effort includes > leading > > > > > > research > > > > > > > > > > > > scientists from a number of universities as well as > > > > > > > practitioners and > > > > > > > > > > > > researchers from a number of major corporations. The > core > > > > of > > > > > > this > > > > > > > > > > group is > > > > > > > > > > > > very active as we meet weekly to discuss research > > > > directions > > > > > > and > > > > > > > > > > > > engineering priorities. > > > > > > > > > > > > > > > > > > > > > > > > It is important to note that our internal systems at > > > Yahoo > > > > > use > > > > > > > the > > > > > > > > > > current > > > > > > > > > > > > public GitHub open source DataSketches library and > not an > > > > > > > internal > > > > > > > > > > version > > > > > > > > > > > > of the code. > > > > > > > > > > > > > > > > > > > > > > > > The close collaboration of scientific research and > > > > > engineering > > > > > > > > > > development > > > > > > > > > > > > experience with actual massive-data processing > systems > > > has > > > > > also > > > > > > > > > > produced > > > > > > > > > > > > new research publications in the field of stochastic > > > > > streaming > > > > > > > > > > algorithms, > > > > > > > > > > > > for example: > > > > > > > > > > > > > > > > > > > > > > > > * Daniel Anderson, Pryce Bevan, Kevin J. Lang, Edo > > > Liberty, > > > > > Lee > > > > > > > > > > Rhodes, and > > > > > > > > > > > > Justin Thaler. A high-performance algorithm for > > > identifying > > > > > > > frequent > > > > > > > > > > items > > > > > > > > > > > > in data streams. In ACM IMC 2017. > > > > > > > > > > > > > > > > > > > > > > > > * Anirban Dasgupta, Kevin J. Lang, Lee Rhodes, and > Justin > > > > > > > Thaler. A > > > > > > > > > > > > framework for estimating stream expression > cardinalities. > > > > In > > > > > > > > > *EDBT/ICDT > > > > > > > > > > > > Proceedings ‘16 *, pages 6:1–6:17, 2016. > > > > > > > > > > > > > > > > > > > > > > > > * Mina Ghashami, Edo Liberty, Jeff M. Phillips. > Efficient > > > > > > > Frequent > > > > > > > > > > > > Directions Algorithm for Sparse Matrices. In ACM > SIGKDD > > > > > > > Proceedings > > > > > > > > > > ‘16, > > > > > > > > > > > > pages 845-854, 2016. > > > > > > > > > > > > > > > > > > > > > > > > * Zohar S. Karnin, Kevin J. Lang, and Edo Liberty. > > > Optimal > > > > > > > quantile > > > > > > > > > > > > approximation in streams. In IEEE FOCS Proceedings > ‘16, > > > > pages > > > > > > > 71–78, > > > > > > > > > > 2016. > > > > > > > > > > > > > > > > > > > > > > > > * Kevin J Lang. Back to the future: an even more > nearly > > > > > optimal > > > > > > > > > > cardinality > > > > > > > > > > > > estimation algorithm. arXiv preprint > > > > > > > > > https://arxiv.org/abs/1708.06839, > > > > > > > > > > > > 2017. > > > > > > > > > > > > > > > > > > > > > > > > * Edo Liberty. Simple and deterministic matrix > sketching. > > > > In > > > > > > ACM > > > > > > > KDD > > > > > > > > > > > > Proceedings ‘13, pages 581– 588, 2013. > > > > > > > > > > > > > > > > > > > > > > > > * Edo Liberty, Michael Mitzenmacher, Justin Thaler, > and > > > > > > Jonathan > > > > > > > > > > Ullman. > > > > > > > > > > > > Space lower bounds for itemset frequency sketches. > In ACM > > > > > PODS > > > > > > > > > > Proceedings > > > > > > > > > > > > ‘16, pages 441–454, 2016. > > > > > > > > > > > > > > > > > > > > > > > > * Michael Mitzenmacher, Thomas Steinke, and Justin > > > Thaler. > > > > > > > > > Hierarchical > > > > > > > > > > > > heavy hitters with the space saving algorithm. In > SIAM > > > > ALENEX > > > > > > > > > > Proceedings > > > > > > > > > > > > ‘12, pages 160–174, 2012. > > > > > > > > > > > > > > > > > > > > > > > > == The Rationale for Sketches == > > > > > > > > > > > > > > > > > > > > > > > > In the analysis of big data there are often problem > > > queries > > > > > > that > > > > > > > > > don’t > > > > > > > > > > > > scale because they require huge compute resources and > > > time > > > > to > > > > > > > > > generate > > > > > > > > > > > > exact results. Examples include count distinct, > > > quantiles, > > > > > most > > > > > > > > > > frequent > > > > > > > > > > > > items, joins, matrix computations, and graph > analysis. > > > > > > > > > > > > > > > > > > > > > > > > If we can loosen the requirement of “exact” results > from > > > > our > > > > > > > queries > > > > > > > > > > and be > > > > > > > > > > > > satisfied with approximate results, within some well > > > > > understood > > > > > > > > > bounds > > > > > > > > > > of > > > > > > > > > > > > error, there is an entire branch of mathematics and > data > > > > > > science > > > > > > > that > > > > > > > > > > has > > > > > > > > > > > > evolved around developing algorithms that can produce > > > > > > approximate > > > > > > > > > > results > > > > > > > > > > > > with mathematically well-defined error properties. > > > > > > > > > > > > > > > > > > > > > > > > With the additional requirements that these > algorithms > > > must > > > > > be > > > > > > > small > > > > > > > > > > > > (compared to the size of the input data), sublinear > (the > > > > size > > > > > > of > > > > > > > the > > > > > > > > > > sketch > > > > > > > > > > > > must grow at a slower rate than the size of the input > > > > > stream), > > > > > > > > > > streaming > > > > > > > > > > > > (they can only touch each data item once), and > mergeable > > > > > > > (suitable > > > > > > > > > for > > > > > > > > > > > > distributed processing), defines a class of > algorithms > > > that > > > > > can > > > > > > > be > > > > > > > > > > > > described as small, stochastic, streaming, sublinear > > > > > mergeable > > > > > > > > > > algorithms, > > > > > > > > > > > > commonly called sketches (they also have other > names, but > > > > we > > > > > > > will use > > > > > > > > > > the > > > > > > > > > > > > term sketches from here on). > > > > > > > > > > > > > > > > > > > > > > > > To be truly streaming and be able to process data in > a > > > > single > > > > > > > pass, > > > > > > > > > > > > sketches must make absolute minimum assumptions > about the > > > > > input > > > > > > > > > stream. > > > > > > > > > > > > This is critically important, as there is no “second > > > > chance” > > > > > to > > > > > > > > > > process the > > > > > > > > > > > > data. > > > > > > > > > > > > > > > > > > > > > > > > For example, sketches should not make assumptions > about > > > the > > > > > > > order of > > > > > > > > > > stream > > > > > > > > > > > > items, the stream length, the dynamic range of > values, or > > > > the > > > > > > > > > > distribution > > > > > > > > > > > > of item occurrence frequencies. Sketches should be > > > tolerant > > > > > of > > > > > > > NaNs, > > > > > > > > > > Nulls > > > > > > > > > > > > and empty objects. About the only thing that the > sketch > > > > needs > > > > > > to > > > > > > > know > > > > > > > > > > about > > > > > > > > > > > > the stream is how to extract items from it and what > type > > > > the > > > > > > > item is, > > > > > > > > > > e.g., > > > > > > > > > > > > is it a numeric value or a string. > > > > > > > > > > > > > > > > > > > > > > > > As far as the sketch is concerned, the input stream > is a > > > > > > > sequence of > > > > > > > > > > items > > > > > > > > > > > > in some unknown random order with unknown random > values. > > > > > > > > > > > > > > > > > > > > > > > > The sketch is essentially a complex state machine and > > > > > combined > > > > > > > with > > > > > > > > > the > > > > > > > > > > > > random input stream defines a stochastic process. We > then > > > > > apply > > > > > > > > > > > > probabilistic methods to interpret the states of the > > > > > stochastic > > > > > > > > > > process in > > > > > > > > > > > > order to extract useful information about the input > > > stream > > > > > > > itself. > > > > > > > > > The > > > > > > > > > > > > resulting information will be approximate, but we > also > > > use > > > > > > > additional > > > > > > > > > > > > probabilistic methods to extract an estimate of the > > > likely > > > > > > > > > probability > > > > > > > > > > > > distribution of error. > > > > > > > > > > > > > > > > > > > > > > > > There is a significant scientific contribution here > that > > > is > > > > > > > defining > > > > > > > > > > the > > > > > > > > > > > > state machine, understanding the resulting stochastic > > > > > process, > > > > > > > > > > developing > > > > > > > > > > > > the probabilistic methods, and proving > mathematically, > > > that > > > > > it > > > > > > > all > > > > > > > > > > works! > > > > > > > > > > > > This is why the scientific contributors to this > project > > > > are a > > > > > > > > > critical > > > > > > > > > > and > > > > > > > > > > > > strategic component to our success. The development > > > > > engineers > > > > > > > > > > translate > > > > > > > > > > > > the concepts of the proposed state machine and > > > > probabilistic > > > > > > > methods > > > > > > > > > > into > > > > > > > > > > > > production-quality code. Even more important, they > work > > > > > closely > > > > > > > with > > > > > > > > > > the > > > > > > > > > > > > scientists, feeding back system and user > requirements, > > > > which > > > > > > > leads > > > > > > > > > not > > > > > > > > > > only > > > > > > > > > > > > to superior product design, but to new science as > well. > > > A > > > > > > > number of > > > > > > > > > > > > scientific papers our members have published (see > above) > > > > is a > > > > > > > direct > > > > > > > > > > result > > > > > > > > > > > > of this close collaboration. > > > > > > > > > > > > > > > > > > > > > > > > Because sketches are small they can be processed > > > extremely > > > > > > fast, > > > > > > > > > often > > > > > > > > > > many > > > > > > > > > > > > orders-of-magnitude faster than traditional exact > > > > > computations. > > > > > > > For > > > > > > > > > > > > interactive queries there may not be other viable > > > > > alternatives, > > > > > > > and > > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > > > case of real-time analysis, sketches are the only > known > > > > > > solution. > > > > > > > > > > > > > > > > > > > > > > > > For any system that needs to extract useful > information > > > > from > > > > > > > massive > > > > > > > > > > data > > > > > > > > > > > > sketches are essential tools that should be tightly > > > > > integrated > > > > > > > into > > > > > > > > > the > > > > > > > > > > > > system’s analysis capabilities. This technology has > > > helped > > > > > > Yahoo > > > > > > > > > > > > successfully reduce data processing times from days > to > > > > hours > > > > > or > > > > > > > > > > minutes on > > > > > > > > > > > > a number of its internal platforms and has enabled > > > > subsecond > > > > > > > queries > > > > > > > > > on > > > > > > > > > > > > real-time platforms that would have been infeasible > > > without > > > > > > > sketches. > > > > > > > > > > > > The Rationale for Apache DataSketches > > > > > > > > > > > > Other open source implementations of sketch > algorithms > > > can > > > > be > > > > > > > found > > > > > > > > > on > > > > > > > > > > the > > > > > > > > > > > > Internet. However, we have not yet found any open > source > > > > > > > > > > implementations > > > > > > > > > > > > that are as comprehensive, engineered with the > quality > > > > > required > > > > > > > for > > > > > > > > > > > > production systems, and with usable and guaranteed > error > > > > > > > properties. > > > > > > > > > > Large > > > > > > > > > > > > Internet companies, such as Google and Facebook, have > > > > > published > > > > > > > > > papers > > > > > > > > > > on > > > > > > > > > > > > sketching, however, their implementations of their > > > > published > > > > > > > > > > algorithms are > > > > > > > > > > > > proprietary and not available as open source. > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches library already provides > integrations > > > > with a > > > > > > > number > > > > > > > > > of > > > > > > > > > > > > major Apache data processing platforms such as Apache > > > Hive, > > > > > > > Apache > > > > > > > > > Pig, > > > > > > > > > > > > Apache Spark and Apache Druid, and is also integrated > > > with > > > > a > > > > > > > number > > > > > > > > > of > > > > > > > > > > > > other open source data processing platforms such as > > > Splice > > > > > > > Machine, > > > > > > > > > > GCHQ > > > > > > > > > > > > Gaffer and PostgreSQL. > > > > > > > > > > > > > > > > > > > > > > > > We believe that having DataSketches as an Apache > project > > > > will > > > > > > > provide > > > > > > > > > > an > > > > > > > > > > > > immediate, worthwhile, and substantial contribution > to > > > the > > > > > open > > > > > > > > > source > > > > > > > > > > > > community, will have a better opportunity to provide > a > > > > > > meaningful > > > > > > > > > > > > contribution to both the science and engineering of > > > > sketching > > > > > > > > > > algorithms, > > > > > > > > > > > > and integrate with other Apache projects. In > addition, > > > > this > > > > > > is a > > > > > > > > > > > > significant opportunity for Apache to be the "go-to" > > > > > > destination > > > > > > > for > > > > > > > > > > users > > > > > > > > > > > > that want to leverage this exciting technology. > > > > > > > > > > > > > > > > > > > > > > > > == Initial Goals == > > > > > > > > > > > > > > > > > > > > > > > > We are breaking our initial goals into short-term > (2-6 > > > > > months) > > > > > > > and > > > > > > > > > > > > intermediate to long-term ( 6 months to 2 years): > > > > > > > > > > > > > > > > > > > > > > > > Our short-term goals include: > > > > > > > > > > > > > > > > > > > > > > > > * Understanding and adapting to the Apache > development > > > > > process > > > > > > > and > > > > > > > > > > > > structures. > > > > > > > > > > > > > > > > > > > > > > > > * Start refactoring codebase and move various > > > DataSketches > > > > > > > > > repositories > > > > > > > > > > > > code to Apache Git repository. > > > > > > > > > > > > > > > > > > > > > > > > * Continue development of new features, functions, > and > > > > fixes. > > > > > > > > > > > > > > > > > > > > > > > > * Specific sub-projects (e.g., C++ and Python) will > > > > continue > > > > > to > > > > > > > be > > > > > > > > > > > > developed and expanded. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The intermediate to long term goals include: > > > > > > > > > > > > > > > > > > > > > > > > * Completing the design and implementation of the C++ > > > > > sketches > > > > > > to > > > > > > > > > > > > complement what is already available in Java, and the > > > > Python > > > > > > > wrappers > > > > > > > > > > of > > > > > > > > > > > > those C++ sketches. > > > > > > > > > > > > > > > > > > > > > > > > * Expanding the C++ build framework to include > Windows > > > and > > > > > the > > > > > > > > > popular > > > > > > > > > > > > Linux variants. > > > > > > > > > > > > > > > > > > > > > > > > * Continued engagement with the scientific research > > > > community > > > > > > on > > > > > > > the > > > > > > > > > > > > development of new algorithms for computationally > > > difficult > > > > > > > problems > > > > > > > > > > that > > > > > > > > > > > > heretofore have not had a sketching solution. > > > > > > > > > > > > > > > > > > > > > > > > == Current Status == > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches GitHub project has been quite > > > successful. > > > > > As > > > > > > of > > > > > > > > > this > > > > > > > > > > > > writing (Feb, 2019) the number of downloads measured > by > > > the > > > > > > Nexus > > > > > > > > > > > > Repository Manager at https://oss.sonatype.org has > grown > > > > by > > > > > > > nearly a > > > > > > > > > > > > factor > > > > > > > > > > > > of 10 over the past year to about 55 thousand per > month. > > > > The > > > > > > > > > > > > DataSketches/sketches-core repository has about 560 > stars > > > > and > > > > > > 141 > > > > > > > > > > forks, > > > > > > > > > > > > which is pretty good for a highly specialized > library. > > > > > > > > > > > > > > > > > > > > > > > > === Development Practices === > > > > > > > > > > > > > > > > > > > > > > > > ==== Source Control ==== > > > > > > > > > > > > > > > > > > > > > > > > All of our developers have extensive experience with > Git > > > > > > version > > > > > > > > > > control > > > > > > > > > > > > and follow accepted practices for use of Pull > Requests > > > > (PRs), > > > > > > > code > > > > > > > > > > reviews > > > > > > > > > > > > and commits to master, for example. > > > > > > > > > > > > > > > > > > > > > > > > ==== Testing ==== > > > > > > > > > > > > > > > > > > > > > > > > Sketches, by their nature are probabilistic programs > and > > > > > don’t > > > > > > > > > > necessarily > > > > > > > > > > > > behave deterministically. For some of the sketches > we > > > > > > > intentionally > > > > > > > > > > insert > > > > > > > > > > > > random noise into the code as this gives us the > > > > mathematical > > > > > > > > > properties > > > > > > > > > > > > that we need to guarantee accuracy. This can make > the > > > > > behavior > > > > > > > of > > > > > > > > > > these > > > > > > > > > > > > algorithms quite unintuitive and provides significant > > > > > > challenges > > > > > > > to > > > > > > > > > the > > > > > > > > > > > > developer who wishes to test these algorithms for > > > > > correctness. > > > > > > > As a > > > > > > > > > > result, > > > > > > > > > > > > our testing strategy includes two major components: > unit > > > > > tests, > > > > > > > and > > > > > > > > > > > > characterization tests. > > > > > > > > > > > > > > > > > > > > > > > > ===== Unit Testing ===== > > > > > > > > > > > > > > > > > > > > > > > > Our unit tests are primarily quick tests to make sure > > > that > > > > we > > > > > > > > > exercise > > > > > > > > > > all > > > > > > > > > > > > critical paths in the code and that key branches are > > > > executed > > > > > > > > > > correctly. It > > > > > > > > > > > > is important that they execute relatively fast as > they > > > are > > > > > > > generally > > > > > > > > > > run on > > > > > > > > > > > > every code build. The sketches-core repository alone > has > > > > > about > > > > > > 22 > > > > > > > > > > thousand > > > > > > > > > > > > statements, over 1300 unit tests and code coverage of > > > about > > > > > > > 98.2% as > > > > > > > > > > > > measured by Atlassian/Clover. It is our goal for > all of > > > > our > > > > > > code > > > > > > > > > > > > repositories that are used in production that they > have > > > > code > > > > > > > coverage > > > > > > > > > > > > greater than 90%. > > > > > > > > > > > > > > > > > > > > > > > > ===== Characterization Testing ===== > > > > > > > > > > > > > > > > > > > > > > > > In order to test the probabilistic methods that are > used > > > to > > > > > > > interpret > > > > > > > > > > the > > > > > > > > > > > > stochastic behaviors of our sketches we have a > separate > > > > > > > > > > characterization > > > > > > > > > > > > repository that is dedicated to this. To measure > > > accuracy, > > > > > for > > > > > > > > > > example, > > > > > > > > > > > > requires running thousands of trials at each of many > > > > > different > > > > > > > points > > > > > > > > > > along > > > > > > > > > > > > the domain axis. Each trial compares its estimated > > > results > > > > > > > against a > > > > > > > > > > known > > > > > > > > > > > > exact result producing an error for that trial. > These > > > > error > > > > > > > > > > measurements > > > > > > > > > > > > are then fed into our Quantiles sketch to capture the > > > > actual > > > > > > > > > > distribution > > > > > > > > > > > > of error at that point along the axis. We then select > > > > > quantile > > > > > > > > > contours > > > > > > > > > > > > across all the distributions at points along the > axis. > > > > These > > > > > > > > > contours > > > > > > > > > > can > > > > > > > > > > > > then be plotted to reveal the shape of the actual > error > > > > > > > distribution. > > > > > > > > > > These > > > > > > > > > > > > distributions are not at all Gaussian, in fact they > can > > > be > > > > > > quite > > > > > > > > > > complex. > > > > > > > > > > > > Nonetheless, these distributions are then checked > against > > > > our > > > > > > > > > > statistical > > > > > > > > > > > > guarantees inherent to the specific sketch algorithm > and > > > > its > > > > > > > > > > parameters. > > > > > > > > > > > > There are many examples of these characterization > error > > > > > > > distributions > > > > > > > > > > on > > > > > > > > > > > > our website. The runtimes of these tests can be very > long > > > > and > > > > > > can > > > > > > > > > range > > > > > > > > > > > > from many minutes to hours, and some can run for > days. > > > > > > > Currently, we > > > > > > > > > > have > > > > > > > > > > > > separate characterization repositories for Java and > C++ / > > > > > > Python. > > > > > > > > > > > > > > > > > > > > > > > > It is our goal that we perform this characterization > > > > analysis > > > > > > > for all > > > > > > > > > > of > > > > > > > > > > > > our sketches. By definition, the code that runs > these > > > > > > > > > characterization > > > > > > > > > > > > tests is open-source so others can run these tests as > > > well. > > > > > We > > > > > > > do > > > > > > > > > not > > > > > > > > > > have > > > > > > > > > > > > formal releases of this code (because it is not > > > production > > > > > > code) > > > > > > > and > > > > > > > > > > it is > > > > > > > > > > > > not published to Maven Central. > > > > > > > > > > > > > > > > > > > > > > > > === Meritocracy === > > > > > > > > > > > > > > > > > > > > > > > > DataSketches was initially developed based on > > > requirements > > > > > > within > > > > > > > > > > Yahoo. As > > > > > > > > > > > > a project on GitHub, DataSketches has received > > > > contributions > > > > > > from > > > > > > > > > > numerous > > > > > > > > > > > > individual developers from around the world, > dedicated > > > > > research > > > > > > > work > > > > > > > > > > from > > > > > > > > > > > > senior scientists at Amazon and Visa, and academic > > > > > researchers > > > > > > > from > > > > > > > > > > > > Georgetown University, Princeton, and MIT. > > > > > > > > > > > > > > > > > > > > > > > > As a project under incubation, we are committed to > > > > expanding > > > > > > our > > > > > > > > > > effort to > > > > > > > > > > > > build an environment which supports a meritocracy. > We are > > > > > > > focused on > > > > > > > > > > > > engaging the community and other related projects for > > > > support > > > > > > and > > > > > > > > > > > > contributions. Moreover, we are committed to ensure > > > > > > contributors > > > > > > > and > > > > > > > > > > > > committers to DataSketches come from a broad mix of > > > > > > organizations > > > > > > > > > > through a > > > > > > > > > > > > merit-based decision process during incubation. We > > > believe > > > > > > > strongly > > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > > > DataSketches premise that fulfills the concept of a > well > > > > > > > engineered > > > > > > > > > and > > > > > > > > > > > > scientifically rigorous library that implements these > > > > > powerful > > > > > > > > > > algorithms > > > > > > > > > > > > and are committed to growing an inclusive community > of > > > > > > > DataSketches > > > > > > > > > > > > contributors and users. > > > > > > > > > > > > > > > > > > > > > > > > === Community === > > > > > > > > > > > > > > > > > > > > > > > > Yahoo has a long history and active engagement in the > > > Open > > > > > > Source > > > > > > > > > > > > community. Major projects include: Vespa.ai, Bullet, > > > > Moloch, > > > > > > > > > Panoptes, > > > > > > > > > > > > Screwdriver.cd, Athenz, HaloDB, Maha, Mendel, > > > > > > TensorFlowOnSpark, > > > > > > > > > > gifshot, > > > > > > > > > > > > fluxible, as well as the creation, contribution and > > > > > incubation > > > > > > of > > > > > > > > > many > > > > > > > > > > > > Apache projects such as Apache Hadoop, Pig, > Bookkeeper, > > > > > Oozie, > > > > > > > > > > Zookeeper, > > > > > > > > > > > > Omid, Pulsar, Traffic Server, Storm, Druid, and many > > > more. > > > > > > > > > > > > > > > > > > > > > > > > Every day, DataSketches is actively used by a > > > organizations > > > > > and > > > > > > > > > > > > institutions around the world for batch and stream > > > > processing > > > > > > of > > > > > > > > > data. > > > > > > > > > > We > > > > > > > > > > > > believe acceptance will allow us to consolidate > existing > > > > > > > > > > > > DataSketches-related work, grow the DataSketches > > > community, > > > > > and > > > > > > > > > deepen > > > > > > > > > > > > connections between DataSketches and other open > source > > > > > > projects. > > > > > > > > > > > > > > > > > > > > > > > > === Introduction to the Core Developers & > Contributors > > > === > > > > > > > > > > > > > > > > > > > > > > > > The core developers and contributors for > DataSketches are > > > > > from > > > > > > > > > diverse > > > > > > > > > > > > backgrounds, but primarily are scientists that love > > > > > engineering > > > > > > > and > > > > > > > > > > > > engineers that love science. A large part of the > value we > > > > > bring > > > > > > > comes > > > > > > > > > > from > > > > > > > > > > > > this synthesis. These individuals have already > > > contributed > > > > > > > > > > substantially > > > > > > > > > > > > to the code, algorithms, and/or mathematical proofs > that > > > > form > > > > > > the > > > > > > > > > > basis of > > > > > > > > > > > > the library. > > > > > > > > > > > > > > > > > > > > > > > > This core group also form the Initial Committers with > > > write > > > > > > > > > > permissions to > > > > > > > > > > > > the repository. Those marked with (*) Meet weekly to > plan > > > > the > > > > > > > > > research > > > > > > > > > > and > > > > > > > > > > > > engineering direction of the project. > > > > > > > > > > > > > > > > > > > > > > > > ==== Scientists That Love Engineering ==== > > > > > > > > > > > > > > > > > > > > > > > > * Eshcar Hillel: Senior Research Scientist, Yahoo > Labs, > > > > > Israel. > > > > > > > > > > Interests: > > > > > > > > > > > > distributed systems, scalable systems and platforms > for > > > big > > > > > > data > > > > > > > > > > > > processing, concurrent algorithms and data > structures, > > > > > > > > > > > > > > > > > > > > > > > > * Kevin Lang: (*) Distinguished Research Scientist, > Yahoo > > > > > Labs, > > > > > > > > > > Sunnyvale, > > > > > > > > > > > > California. Interests: algorithms, theoretical and > > > applied > > > > > > > > > mathematics, > > > > > > > > > > > > encoding and compression theory, theoretical and > applied > > > > > > > performance > > > > > > > > > > > > optimization. > > > > > > > > > > > > > > > > > > > > > > > > * Edo Liberty: (*) Director of Research, Head of > Amazon > > > AI > > > > > > Labs, > > > > > > > Palo > > > > > > > > > > Alto, > > > > > > > > > > > > California. Manages the algorithms group at Amazon > AI. We > > > > > build > > > > > > > > > > scalable > > > > > > > > > > > > machine learning systems and algorithms which are > used > > > both > > > > > > > > > internally > > > > > > > > > > and > > > > > > > > > > > > externally by customers of SageMaker, AWS's flagship > > > > machine > > > > > > > learning > > > > > > > > > > > > platform. > > > > > > > > > > > > > > > > > > > > > > > > * Jon Malkin: (*) Senior Scientist, Yahoo Labs, > > > Sunnyvale. > > > > > > > Interests: > > > > > > > > > > > > Computational advertising, machine learning, speech > > > > > > recognition, > > > > > > > > > > > > data-driven analysis, large scale experimentation, > big > > > > data, > > > > > > > > > > stream/complex > > > > > > > > > > > > event processing > > > > > > > > > > > > > > > > > > > > > > > > * Justin Thaler: (*) Assistant Professor, Department > of > > > > > > Computer > > > > > > > > > > Science, > > > > > > > > > > > > Georgetown University, Washington D.C. Interests: > > > > algorithms > > > > > > and > > > > > > > > > > > > computational complexity, complexity theory, quantum > > > > > > algorithms, > > > > > > > > > > private > > > > > > > > > > > > data analysis, and learning theory, developing > efficient > > > > > > > streaming > > > > > > > > > and > > > > > > > > > > > > sketching algorithms > > > > > > > > > > > > > > > > > > > > > > > > ==== Engineers That Love Science ==== > > > > > > > > > > > > > > > > > > > > > > > > * Roman Leventov: Senior Software Engineer, > Metamarkets > > > / > > > > > > Snap. > > > > > > > > > > Interests: > > > > > > > > > > > > design and implementation of data storing and data > > > > processing > > > > > > > > > > (distributed) > > > > > > > > > > > > systems, performance optimization, CPU performance, > > > > > mechanical > > > > > > > > > > sympathy, > > > > > > > > > > > > JVM performance, API design, databases, (concurrent) > data > > > > > > > structures, > > > > > > > > > > > > memory management, garbage collection algorithms, > > > language > > > > > > > design and > > > > > > > > > > > > runtimes (their tradeoffs), distributed systems > (cloud) > > > > > > > efficiency, > > > > > > > > > > Linux, > > > > > > > > > > > > code quality, code transformation, pure functional > > > > > programming > > > > > > > > > models, > > > > > > > > > > > > Haskell. > > > > > > > > > > > > > > > > > > > > > > > > * Lee Rhodes: (*) Distinguished Architect, lead > developer > > > > and > > > > > > > founder > > > > > > > > > > of > > > > > > > > > > > > the DataSketches project, Yahoo, Sunnyvale, > California. > > > > > > > Interests: > > > > > > > > > > > > streaming algorithms, mathematics, computer science, > high > > > > > > > quality and > > > > > > > > > > high > > > > > > > > > > > > performance code for the analysis of massive data, > > > bridging > > > > > the > > > > > > > > > divide > > > > > > > > > > > > between theory and practice. > > > > > > > > > > > > > > > > > > > > > > > > * Alexander Saydakov: (*) Senior Software Engineer, > > > Yahoo, > > > > > > > Sunnyvale, > > > > > > > > > > > > California. Interests: applied mathematics, computer > > > > science, > > > > > > big > > > > > > > > > data, > > > > > > > > > > > > distributed systems. > > > > > > > > > > > > > > > > > > > > > > > > === Introduction to Additional Interested > Contributors > > > === > > > > > > > > > > > > > > > > > > > > > > > > These folks have been intermittently involved and > > > > > contributed, > > > > > > > but > > > > > > > > > are > > > > > > > > > > > > strong supporters of this project. > > > > > > > > > > > > > > > > > > > > > > > > * Frank Grimes: GitHub ID: frankgrimes97 > > > > > > > > > > > > > > > > > > > > > > > > * Mina Ghashami: [mina.ghashami at gmail dot com] > Ph.D. > > > > > > Computer > > > > > > > > > > Science, > > > > > > > > > > > > Univ of Utah. Interests: Machine Learning, Data > Mining, > > > > > matrix > > > > > > > > > > > > approximation, streaming algorithms, randomized > linear > > > > > algebra. > > > > > > > > > > > > > > > > > > > > > > > > * Christopher Musco: [christopher.musco at gmail dot > com] > > > > > Ph.D. > > > > > > > > > > Computer > > > > > > > > > > > > Science, Research Instructor, Princeton University. > > > > > Interests: > > > > > > > > > > algorithmic > > > > > > > > > > > > foundations of data science and machine learning, > > > efficient > > > > > > > methods > > > > > > > > > for > > > > > > > > > > > > processing and understanding large datasets, often > > > working > > > > at > > > > > > the > > > > > > > > > > > > intersection of theoretical computer science, > numerical > > > > > linear > > > > > > > > > > algebra, and > > > > > > > > > > > > optimization. > > > > > > > > > > > > > > > > > > > > > > > > * Graham Cormode: [g.cormode at warwick.ac dot uk] > Ph.D. > > > > > > > Computer > > > > > > > > > > Science, > > > > > > > > > > > > Professor, Warwick University, Warwick, England. > > > Interests: > > > > > all > > > > > > > > > > aspects of > > > > > > > > > > > > the "data lifecycle", from data collection and > cleaning, > > > > > > through > > > > > > > > > > mining and > > > > > > > > > > > > analytics. (Professor Cormode is one of the world’s > > > leading > > > > > > > > > scientists > > > > > > > > > > in > > > > > > > > > > > > sketching algorithms) > > > > > > > > > > > > > > > > > > > > > > > > === Alignment === > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches library already provides > integrations > > > and > > > > > > > example > > > > > > > > > > code for > > > > > > > > > > > > Apache Hive, Apache Pig, Apache Spark and is deeply > > > > > integrated > > > > > > > into > > > > > > > > > > Apache > > > > > > > > > > > > Druid. > > > > > > > > > > > > > > > > > > > > > > > > == Known Risks == > > > > > > > > > > > > > > > > > > > > > > > > The following subsections are specific risks that > have > > > been > > > > > > > > > identified > > > > > > > > > > by > > > > > > > > > > > > the ASF that need to be addressed. > > > > > > > > > > > > > > > > > > > > > > > > === Risk: Orphaned Products === > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches library is presently used by a > number of > > > > > > > > > > organizations, > > > > > > > > > > > > from small startups to Fortune 100 companies, to > > > construct > > > > > > > production > > > > > > > > > > > > pipelines that must process and analyze massive data. > > > Yahoo > > > > > > has a > > > > > > > > > > long-term > > > > > > > > > > > > commitment to continue to advance the DataSketches > > > library; > > > > > > > moreover, > > > > > > > > > > > > DataSketches is seeing increasing interest, > development, > > > > and > > > > > > > adoption > > > > > > > > > > from > > > > > > > > > > > > many diverse organizations from around the world. > Due to > > > > its > > > > > > > growing > > > > > > > > > > > > adoption, we feel it is quite unlikely that this > project > > > > > would > > > > > > > become > > > > > > > > > > > > orphaned. > > > > > > > > > > > > > > > > > > > > > > > > === Risk: Inexperience with Open Source === > > > > > > > > > > > > > > > > > > > > > > > > Yahoo believes strongly in open source and the > exchange > > > of > > > > > > > > > information > > > > > > > > > > to > > > > > > > > > > > > advance new ideas and work. Examples of this > commitment > > > are > > > > > > > active > > > > > > > > > open > > > > > > > > > > > > source projects such as those mentioned above. With > > > > > > > DataSketches, we > > > > > > > > > > have > > > > > > > > > > > > been increasingly open and forward-looking; we have > > > > > published a > > > > > > > > > number > > > > > > > > > > of > > > > > > > > > > > > papers about breakthrough developments in the > science of > > > > > > > streaming > > > > > > > > > > > > algorithms (mentioned above) that also reference the > > > > > > DataSketches > > > > > > > > > > library. > > > > > > > > > > > > Our submission to the Apache Software Foundation is a > > > > logical > > > > > > > > > > extension of > > > > > > > > > > > > our commitment to open source software. > > > > > > > > > > > > > > > > > > > > > > > > Key committers at Yahoo with strong open source > > > backgrounds > > > > > > > include > > > > > > > > > > Aaron > > > > > > > > > > > > Gresch, Alan Carroll, Alessandro Bellina, Anastasia > > > > > Braginsky, > > > > > > > > > Andrews > > > > > > > > > > > > Sahaya Albert, Arun S A G, Atul Mohan, Brad McMillen, > > > Bryan > > > > > > Call, > > > > > > > > > Daryn > > > > > > > > > > > > Sharp, Dav Glass, David Carlin, Derek Dagit, Eric > Payne, > > > > > Eshcar > > > > > > > > > Hillel, > > > > > > > > > > > > Ethan Li, Fei Deng, Francis Christopher Liu, > Francisco > > > > > > > > > Perez-Sorrosal, > > > > > > > > > > Gil > > > > > > > > > > > > Yehuda. Govind Menon, Hang Yang, Jacob Estelle, Jai > > > Asher, > > > > > > James > > > > > > > > > > Penick, > > > > > > > > > > > > Jason Kenny, Jay Pipes, Jim Rollenhagen, Joe > Francis, Jon > > > > > > Eagles, > > > > > > > > > > Kihwal > > > > > > > > > > > > Lee, Kishorkumar Patil, Koji Noguchi, Kuhu Shukla, > > > Michael > > > > > > > Trelinski, > > > > > > > > > > > > Mithun Radhakrishnan, Nathan Roberts, Ohad Shacham, > Olga > > > L. > > > > > > > > > Natkovich, > > > > > > > > > > > > Parth Kamlesh Gandhi, Rajan Dhabalia, Rohini > Palaniswamy, > > > > > Ruby > > > > > > > Loo, > > > > > > > > > > Ryan > > > > > > > > > > > > Bridges, Sanket Chintapalli, Satish Subhashrao > Saley, Shu > > > > Kit > > > > > > > Chan, > > > > > > > > > Sri > > > > > > > > > > > > Harsha Mekala, Susan Hinrichs, Yonatan Gottesman, and > > > many > > > > > > more. > > > > > > > > > > > > > > > > > > > > > > > > All of our core developers are committed to learn > about > > > the > > > > > > > Apache > > > > > > > > > > process > > > > > > > > > > > > and to give back to the community. > > > > > > > > > > > > > > > > > > > > > > > > === Risk: Homogeneous Developers === > > > > > > > > > > > > > > > > > > > > > > > > The majority of committers in this proposal belong to > > > Yahoo > > > > > due > > > > > > > to > > > > > > > > > the > > > > > > > > > > fact > > > > > > > > > > > > that DataSketches has emerged from an internal Yahoo > > > > project. > > > > > > > This > > > > > > > > > > proposal > > > > > > > > > > > > also includes developers and contributors from other > > > > > companies, > > > > > > > and > > > > > > > > > > who are > > > > > > > > > > > > actively involved with other Apache projects, such as > > > > Druid. > > > > > > We > > > > > > > > > > expect our > > > > > > > > > > > > entry into incubation will allow us to expand the > number > > > of > > > > > > > > > > individuals and > > > > > > > > > > > > organizations participating in DataSketches > development. > > > > > > > > > > > > > > > > > > > > > > > > === Risk: Reliance on Salaried Developers === > > > > > > > > > > > > > > > > > > > > > > > > Because the DataSketches library originated within > Yahoo, > > > > it > > > > > > has > > > > > > > been > > > > > > > > > > > > developed primarily by salaried Yahoo developers and > we > > > > > expect > > > > > > > that > > > > > > > > > to > > > > > > > > > > > > continue to be the case near term. However, since we > > > placed > > > > > > this > > > > > > > > > > library > > > > > > > > > > > > into open-source we have had a number of significant > > > > > > > contributions > > > > > > > > > from > > > > > > > > > > > > engineers and scientists from outside of Yahoo. We > expect > > > > our > > > > > > > > > reliance > > > > > > > > > > on > > > > > > > > > > > > Yahoo salaried developers will decrease over time. > > > > > Nonetheless, > > > > > > > Yahoo > > > > > > > > > > is > > > > > > > > > > > > committed to continue its strong support of this > > > important > > > > > > > project. > > > > > > > > > > > > > > > > > > > > > > > > === Risk: Lack of Relationship to other Apache > Products > > > === > > > > > > > > > > > > > > > > > > > > > > > > DataSketches already directly interoperates with or > > > > utilizes > > > > > > > several > > > > > > > > > > > > existing Apache projects. > > > > > > > > > > > > > > > > > > > > > > > > * Build > > > > > > > > > > > > * Apache Maven > > > > > > > > > > > > > > > > > > > > > > > > * Integrations and adaptors for the following > projects > > > > > > naturally > > > > > > > have > > > > > > > > > > them > > > > > > > > > > > > as dependencies > > > > > > > > > > > > * Apache Hive > > > > > > > > > > > > * Apache Pig > > > > > > > > > > > > * Apache Druid > > > > > > > > > > > > * Apache Spark > > > > > > > > > > > > > > > > > > > > > > > > * Additional dependencies for the above integrations > and > > > > > > adaptors > > > > > > > > > > include > > > > > > > > > > > > * Apache Hadoop > > > > > > > > > > > > * Apache Commons (Math) > > > > > > > > > > > > > > > > > > > > > > > > There is no other Apache project that we are aware of > > > that > > > > > > > duplicates > > > > > > > > > > the > > > > > > > > > > > > functionality of the DataSketches library. > > > > > > > > > > > > > > > > > > > > > > > > === Risk: An Excessive Fascination with the Apache > Brand > > > > === > > > > > > > > > > > > > > > > > > > > > > > > With this proposal we are not seeking attention or > > > > publicity. > > > > > > > Rather, > > > > > > > > > > we > > > > > > > > > > > > firmly believe in the DataSketches library and > concept > > > and > > > > > the > > > > > > > > > ability > > > > > > > > > > to > > > > > > > > > > > > make the DataSketches library a powerful, yet > > > simple-to-use > > > > > > > toolkit > > > > > > > > > for > > > > > > > > > > > > data processing. While the DataSketches library has > been > > > > open > > > > > > > source, > > > > > > > > > > we > > > > > > > > > > > > believe putting code on GitHub can only go so far. > We see > > > > the > > > > > > > Apache > > > > > > > > > > > > community, processes, and mission as critical for > > > ensuring > > > > > the > > > > > > > > > > DataSketches > > > > > > > > > > > > library is truly community-driven, positively > impactful, > > > > and > > > > > > > > > innovative > > > > > > > > > > > > open source software. While Yahoo has taken a number > of > > > > steps > > > > > > to > > > > > > > > > > advance > > > > > > > > > > > > its various open source projects, we believe the > > > > DataSketches > > > > > > > library > > > > > > > > > > > > project is a great fit for the Apache Software > Foundation > > > > due > > > > > > to > > > > > > > its > > > > > > > > > > focus > > > > > > > > > > > > on data processing and its relationships to existing > ASF > > > > > > > projects. > > > > > > > > > > > > > > > > > > > > > > > > === Risk: Cryptography === > > > > > > > > > > > > > > > > > > > > > > > > DataSketches does not contain any cryptographic code > and > > > is > > > > > > not a > > > > > > > > > > > > cryptographic product. > > > > > > > > > > > > > > > > > > > > > > > > == Documentation == > > > > > > > > > > > > > > > > > > > > > > > > The following documentation is relevant to this > proposal. > > > > > > > Relevant > > > > > > > > > > portions > > > > > > > > > > > > of the documentation will be contributed to the > Apache > > > > > > > DataSketches > > > > > > > > > > > > project. > > > > > > > > > > > > > > > > > > > > > > > > * DataSketches website: > https://datasketches.github.io. > > > > > > > > > > > > > > > > > > > > > > > > * DataSketches website repository: > > > > > > > > > > > > > https://github.com/DataSketches/DataSketches.github.io > > > > > > > > > > > > > > > > > > > > > > > > We will need an apache website for this documentation > > > > similar > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > > * https://datasketches.apache.org > > > > > > > > > > > > > > > > > > > > > > > > == Initial Source == > > > > > > > > > > > > > > > > > > > > > > > > The initial source for DataSketches which we will > submit > > > to > > > > > the > > > > > > > > > Apache > > > > > > > > > > > > Foundation will include a number of repositories > which > > > are > > > > > > > currently > > > > > > > > > > hosted > > > > > > > > > > > > under the GitHub.com/datasketches organization: > > > > > > > > > > > > > > > > > > > > > > > > All github.com/datasketches repositories including: > > > > > > > > > > > > > > > > > > > > > > > > * Java > > > > > > > > > > > > * sketches-core: This repository has the core > > > sketching > > > > > > > classes, > > > > > > > > > > which > > > > > > > > > > > > are leveraged by some of the other repositories. This > > > > > > repository > > > > > > > has > > > > > > > > > no > > > > > > > > > > > > external dependencies outside of the > DataSketches/memory > > > > > > > repository, > > > > > > > > > > Java > > > > > > > > > > > > and TestNG for unit tests. This code is versioned > and the > > > > > > latest > > > > > > > > > > release > > > > > > > > > > > > can be obtained from Maven Central. > > > > > > > > > > > > * memory: Low level, high-performance memory > > > > > data-structure > > > > > > > > > > management > > > > > > > > > > > > primarily for off-heap. > > > > > > > > > > > > * sketches-android: This is a new repository > dedicated > > > > to > > > > > > > sketches > > > > > > > > > > > > designed to be run in a mobile client, such as a cell > > > > phone. > > > > > It > > > > > > > is > > > > > > > > > > still in > > > > > > > > > > > > development and should be considered experimental. > > > > > > > > > > > > * sketches-hive: This repository contains Hive > UDFs > > > and > > > > > > UDAFs > > > > > > > for > > > > > > > > > > use > > > > > > > > > > > > within Hadoop grid environments. This code has > > > dependencies > > > > > on > > > > > > > > > > > > sketches-core as well as Hadoop and Hive. Users of > this > > > > code > > > > > > are > > > > > > > > > > advised to > > > > > > > > > > > > use Maven to bring in all the required dependencies. > This > > > > > code > > > > > > is > > > > > > > > > > versioned > > > > > > > > > > > > and the latest release can be obtained from Maven > > > Central. > > > > > > > > > > > > * sketches-pig: This repository contains Pig User > > > > Defined > > > > > > > > > Functions > > > > > > > > > > > > (UDF) for use within Hadoop grid environments. This > code > > > > has > > > > > > > > > > dependencies > > > > > > > > > > > > on sketches-core as well as Hadoop and Pig. Users of > this > > > > > code > > > > > > > are > > > > > > > > > > advised > > > > > > > > > > > > to use Maven to bring in all the required > dependencies. > > > > This > > > > > > > code is > > > > > > > > > > > > versioned and the latest release can be obtained from > > > Maven > > > > > > > Central. > > > > > > > > > > > > * sketches-vector: This is a new repository > dedicated > > > to > > > > > > > sketches > > > > > > > > > > for > > > > > > > > > > > > vector and matrix operations. It is still somewhat > > > > > > experimental. > > > > > > > > > > > > * characterization: This relatively new > repository is > > > > for > > > > > > code > > > > > > > > > that > > > > > > > > > > we > > > > > > > > > > > > use to characterize the accuracy and speed > performance of > > > > the > > > > > > > > > sketches > > > > > > > > > > in > > > > > > > > > > > > the library and is constantly being updated. > Examples of > > > > the > > > > > > job > > > > > > > > > > command > > > > > > > > > > > > files used for various tests can be found in the > > > > > > > src/main/resources > > > > > > > > > > > > directory. Some of these tests can run for hours > > > depending > > > > on > > > > > > its > > > > > > > > > > > > configuration. > > > > > > > > > > > > * experimental: This repository is an experimental > > > > staging > > > > > > > area > > > > > > > > > for > > > > > > > > > > code > > > > > > > > > > > > that will eventually end up in another repository. > This > > > > code > > > > > is > > > > > > > not > > > > > > > > > > > > versioned and not registered with Maven Central. > > > > > > > > > > > > * sketches-misc: Demos and other code not related > to > > > > > > > production > > > > > > > > > > > > deployment > > > > > > > > > > > > > > > > > > > > > > > > * C++ and Python > > > > > > > > > > > > * sketches-core-cpp: This is the C++/Python > companion > > > to > > > > > the > > > > > > > Java > > > > > > > > > > > > sketches-core. These implementations are binary > > > compatible > > > > > with > > > > > > > their > > > > > > > > > > > > counterparts in Java. In other words, a sketch > created > > > and > > > > > > > stored in > > > > > > > > > > C++ > > > > > > > > > > > > can be opened and read in Java and visa-versa. This > site > > > > also > > > > > > > has our > > > > > > > > > > > > Python adaptors that basically wrap the C++ > > > > implementations, > > > > > > > making > > > > > > > > > the > > > > > > > > > > > > high performance C++ implementations available from > > > Python. > > > > > > > > > > > > * sketches-postgres: This site provides the > > > > > > postgres-specific > > > > > > > > > > adaptors > > > > > > > > > > > > that wrap the C++ implementations making them > available > > > to > > > > > the > > > > > > > > > Postgres > > > > > > > > > > > > database users. > > > > > > > > > > > > * characterization-cpp: This is the C++/Python > > > companion > > > > > to > > > > > > > the > > > > > > > > > Java > > > > > > > > > > > > characterization repository. > > > > > > > > > > > > * experimental-cpp: This repository is an > experimental > > > > > > staging > > > > > > > > > area > > > > > > > > > > for > > > > > > > > > > > > C++ code that will eventually end up in another > > > repository. > > > > > > > > > > > > > > > > > > > > > > > > * Command-Line Tools > > > > > > > > > > > > * sketches-cmd > > > > > > > > > > > > * homebrew-sketches > > > > > > > > > > > > * homebrew-sketches-cmd > > > > > > > > > > > > > > > > > > > > > > > > These projects have always been Apache 2.0 licensed. > We > > > > > intend > > > > > > to > > > > > > > > > > bundle > > > > > > > > > > > > all of these repositories since they are all > > > complementary > > > > > and > > > > > > > should > > > > > > > > > > be > > > > > > > > > > > > maintained in one project. Prior to our submission, > we > > > will > > > > > > > combine > > > > > > > > > > all of > > > > > > > > > > > > these projects into a new git repository. > > > > > > > > > > > > > > > > > > > > > > > > == Source and Intellectual Property Submission Plan > == > > > > > > > > > > > > > > > > > > > > > > > > Contributors to the DataSketches project have also > signed > > > > the > > > > > > > Yahoo > > > > > > > > > > > > Individual Contributor License Agreement ( > > > > > > > > > > https://yahoocla.herokuapp.com/ > > > > > > > > > > > > in order to contribute to the project. > > > > > > > > > > > > > > > > > > > > > > > > With respect to trademark rights, Yahoo does not > hold a > > > > > > > trademark on > > > > > > > > > > the > > > > > > > > > > > > phrase “DataSketches.” Based on feedback and > guidance we > > > > > > receive > > > > > > > > > > during the > > > > > > > > > > > > incubation process, we are open to renaming the > project > > > if > > > > > > > necessary > > > > > > > > > > for > > > > > > > > > > > > trademark or other concerns, but we would prefer not > to > > > > have > > > > > to > > > > > > > do > > > > > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > > > == External Dependencies == > > > > > > > > > > > > > > > > > > > > > > > > All external dependencies are licensed under an > Apache > > > 2.0 > > > > or > > > > > > > > > > > > Apache-compatible license. As we grow the > DataSketches > > > > > > community > > > > > > > we > > > > > > > > > > will > > > > > > > > > > > > configure our build process to require and validate > all > > > > > > > contributions > > > > > > > > > > and > > > > > > > > > > > > dependencies are licensed under the Apache 2.0 > license or > > > > are > > > > > > > under > > > > > > > > > an > > > > > > > > > > > > Apache-compatible license. > > > > > > > > > > > > > > > > > > > > > > > > == Required Resources == > > > > > > > > > > > > > > > > > > > > > > > > === Mailing Lists === > > > > > > > > > > > > > > > > > > > > > > > > We currently use a mix of mailing lists. We will > migrate > > > > our > > > > > > > existing > > > > > > > > > > > > mailing lists to the following: > > > > > > > > > > > > > > > > > > > > > > > > * d...@datasketches.incubator.apache.org > > > > > > > > > > > > > > > > > > > > > > > > * u...@datasketches.incubator.apache.org > > > > > > > > > > > > > > > > > > > > > > > > * priv...@datasketches.incubator.apache.org > > > > > > > > > > > > > > > > > > > > > > > > * comm...@datasketches.incubator.apache.org > > > > > > > > > > > > > > > > > > > > > > > > === Source Control === > > > > > > > > > > > > > > > > > > > > > > > > The DataSketches team currently uses Git and would > like > > > to > > > > > > > continue > > > > > > > > > to > > > > > > > > > > do > > > > > > > > > > > > so. We request a Git repository for DataSketches with > > > > > mirroring > > > > > > > to > > > > > > > > > > GitHub > > > > > > > > > > > > enabled similar the following: > > > > > > > > > > > > > > > > > > > > > > > > * > https://github.com/apache/incubator-datasketches.git > > > > > > > > > > > > > > > > > > > > > > > > === Issue Tracking === > > > > > > > > > > > > > > > > > > > > > > > > We request the creation of an Apache-hosted JIRA. The > > > > > > > DataSketches > > > > > > > > > > project > > > > > > > > > > > > is currently using the public GitHub issue tracker > and > > > the > > > > > > public > > > > > > > > > > Google > > > > > > > > > > > > Groups forum/sketches-user for issue tracking and > > > > > discussions. > > > > > > We > > > > > > > > > will > > > > > > > > > > > > migrate and combine from these two sources to the > Apache > > > > > JIRA. > > > > > > > > > > > > > > > > > > > > > > > > Proposed Jira ID: DATASKETCHES > > > > > > > > > > > > > > > > > > > > > > > > == Initial Committers == > > > > > > > > > > > > > > > > > > > > > > > > The following list of individuals have been extremely > > > > active > > > > > in > > > > > > > our > > > > > > > > > > > > community and should have write (commit) permissions > to > > > the > > > > > > > > > repository. > > > > > > > > > > > > > > > > > > > > > > > > * Eshcar Hillel [eshcar at > > > > verizonmedia > > > > > > dot > > > > > > > com] > > > > > > > > > > > > > > > > > > > > > > > > * Kevin Lang [langk at > verizonmedia > > > dot > > > > > com] > > > > > > > > > > > > > > > > > > > > > > > > * Roman Leventov [roman.leventov at > > > > > c.metamarkets > > > > > > > dot > > > > > > > > > com] > > > > > > > > > > > > > > > > > > > > > > > > * Edo Liberty [libertye at amazon > dot > > > > com] > > > > > > > > > > > > > > > > > > > > > > > > * Jon Malkin [jmalkin at > verizonmedia > > > > dot > > > > > > com] > > > > > > > > > > > > > > > > > > > > > > > > * Lee Rhodes [lrhodes at > verizonmedia > > > dot > > > > > > com] & > > > > > > > > > > [leerho > > > > > > > > > > > > at gmail dot com] > > > > > > > > > > > > > > > > > > > > > > > > * Alexander Saydakov [saydakov at > verizonmedia > > > dot > > > > > com] > > > > > > > > > > > > > > > > > > > > > > > > * Justin Thaler [justin.thaler at > > > > georgetown > > > > > > dot > > > > > > > edu] > > > > > > > > > > > > > > > > > > > > > > > > == Affiliations == > > > > > > > > > > > > > > > > > > > > > > > > The initial committers are from four organizations: > > > Yahoo, > > > > > > > Amazon, > > > > > > > > > > > > Georgetown University, and Metamarkets/Snap. > > > > > > > > > > > > > > > > > > > > > > > > === Champion === > > > > > > > > > > > > (Recommended to me: ) > > > > > > > > > > > > > > > > > > > > > > > > Liang Chen, Vice President of Apache CarbonData, > > > > > [chenliang613 > > > > > > at > > > > > > > > > > apache > > > > > > > > > > > > dot org] > > > > > > > > > > > > Jean-Baptiste Onofré,[[jb at nanthrax dot net] > > > > > > > > > > > > > > > > > > > > > > > > === Nominated Mentors === > > > > > > > > > > > > (Recommended to me: ) > > > > > > > > > > > > > > > > > > > > > > > > Liang Chen, Vice President of Apache CarbonData, > > > > > [chenliang613 > > > > > > at > > > > > > > > > > apache > > > > > > > > > > > > dot org] > > > > > > > > > > > > Jean-Baptiste Onofré, jb at nanthrax dot net > > > > > > > > > > > > Gil Yehuda, gyehuda at verizonmedia dot com > > > > > > > > > > > > > > > > > > > > > > > > === Sponsoring Entity === > > > > > > > > > > > > > > > > > > > > > > > > * The Apache Incubator **** This is our 1st choice > > > **** > > > > > > > > > > > > > > > > > > > > > > > > * Apache Druid. The incubating Apache Druid project > might > > > > > also > > > > > > > be a > > > > > > > > > > logical > > > > > > > > > > > > sponsor. However, DataSketches has applications in > many > > > > areas > > > > > > of > > > > > > > > > > computing > > > > > > > > > > > > outside of Druid so our preference and > recommendation is > > > > that > > > > > > > > > > DataSketches > > > > > > > > > > > > would ultimately be a top-level Apache project. > > > > > > > > > > > > > > > > > > > > > > > > ________________ > > > > > > > > > > > > [1] In 2017 Verizon acquired Yahoo and merged it with > > > > > > previously > > > > > > > > > > acquired > > > > > > > > > > > > AOL. The merged entity was originally called Oath, > Inc., > > > > but > > > > > > has > > > > > > > > > > recently > > > > > > > > > > > > been renamed Verizon Media, Inc., a wholly-owned > > > subsidiary > > > > > of > > > > > > > > > Verizon, > > > > > > > > > > > > Inc. Since Yahoo is the more recognized name, > references > > > > in > > > > > > this > > > > > > > > > > document > > > > > > > > > > > > to Yahoo, are also a reference to Verizon Media, Inc. > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 22, 2019 at 9:35 PM Kenneth Knowles < > > > > > > k...@apache.org > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > The subject line has me interested already. Follow > > > > examples > > > > > > > like > > > > > > > > > this > > > > > > > > > > > > > maybe? > > > > > > > > > > > > > > > > > > > > > > > > > > 1. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/a5db74cc9e5ae89b3bfa5f4b07bfcc18dae84b7098232fb897cd47b7@%3Cgeneral.incubator.apache.org%3E > > > > > > > > > > > > > 2. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/5a7f6a218b11a1cac61fbd53f4c995fd7716f8ad3751cf9f171ebd57@%3Cgeneral.incubator.apache.org%3E > > > > > > > > > > > > > > > > > > > > > > > > > > Kenn > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 22, 2019 at 8:05 PM leerho < > > > lee...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll try again ... :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 22, 2019 at 8:00 PM Ted Dunning < > > > > > > > > > ted.dunn...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > >> It didn't make it again > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> On Fri, Feb 22, 2019, 8:35 PM leerho < > > > > lee...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > I'm not sure the attached document made it > > > through. > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > On Fri, Feb 22, 2019 at 7:28 PM leerho < > > > > > > lee...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > > > > > > To unsubscribe, e-mail: > > > > > > > general-unsubscr...@incubator.apache.org > > > > > > > > > > > > > > For additional commands, e-mail: > > > > > > > > > general-h...@incubator.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > > To unsubscribe, e-mail: > > > > general-unsubscr...@incubator.apache.org > > > > > > > > > > For additional commands, e-mail: > > > > > general-h...@incubator.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > From my cell phone. > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > To unsubscribe, e-mail: > general-unsubscr...@incubator.apache.org > > > > > > > For additional commands, e-mail: > general-h...@incubator.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > From my cell phone. > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >