+1 Happy to help out if the vote passes.
On Tue, Oct 17, 2017 at 10:07 PM, Madhawa Kasun Gunasekara < madhaw...@gmail.com> wrote: > Here is my +1 > > Thanks, > Madhawa > > Madhawa > > On Tue, Oct 17, 2017 at 4:04 PM, lewis john mcgibbney <lewi...@apache.org> > wrote: > > > Hi Folks, > > Having secured a mentorship team consisting of the following IPMC > Members, > > I am happy to open a formal VOTE thread on accepting the Science Data > > Analytics Platform (SDAP) into Apache Incubator. > > > > - Lewis John McGibbney (lewi...@apache.org) > > - Raphael Bircher (bircher at apace dot org) > > - Suneel Marthi (smarthi at apache dot org) > > > > Thank you to both Raphael and Suneel for coming forward. :) > > The VOTE will be open for at least 72 hours. > > > > [ ] +1 Accept Science Data Analytics Platform (SDAP) into Apache > Incubator > > [ ] +/-0 ... just because > > [ ] -1 Do NOT Accept Science Data Analytics Platform (SDAP) into Apache > > Incubator... because > > > > Thanks in advance to all participants. > > Lewis > > > > P.S. Here is a binding +1 from me > > > > On Wed, Oct 11, 2017 at 11:22 AM, lewis john mcgibbney < > lewi...@apache.org > > > > > wrote: > > > > > Hi Folks, > > > I would like to open a DISCUSS thread on the topic of accepting the > > > Science Data Analytics Platform (SDAP) <https://wiki.apache.org/ > > > incubator/SDAPProposal> Project into the Incubator. > > > I am CC'ing Thomas Huang from NASA JPL who I have been working with to > > > build community around a kick-ass set of software projects under the > SDAP > > > umbrella. > > > At this stage we would very much appreciate critical feedback from > > general@ > > > community. We are also open to mentors who may have an interest in the > > > project proposal. > > > The proposal is pasted below. > > > Thanks in advance, > > > Lewis > > > > > > = Abstract = > > > The Science Data Analytics Platform (SDAP) establishes an integrated > data > > > analytic center for Big Science problems. It focuses on technology > > > integration, advancement and maturity. > > > > > > = Proposal = > > > SDAP currently represents a collaboration between NASA Jet Propulsion > > > Laboratory (JPL), Florida State University (FSU), the National Center > for > > > Atmospheric Research (NCAR), and George Mason University (GMU). SDAP > > brings > > > together a number of big data technologies including a NASA funded > > > OceanXtremes (Anomaly detection and ocean science), NEXUS (Deep data > > > analytic platform), DOMS (Distributed in-situ to satellite matchup), > > MUDROD > > > (Search relevancy and discovery) and VQSS (Virtualized Quality > Screening > > > Service) under a single umbrella. Within the original Incubator > proposal, > > > VQSS will not be included however it is anticipated that a future > source > > > code donation will cover VQSS. > > > > > > = Background and Rationale = > > > SDAP is a technology software solution currently geared to better > enable > > > scientists involved in advancing the study of the Earth's physical > > > oceanography. With increasing global temperature, warming of the ocean, > > and > > > melting ice sheets and glaciers, the impacts can be observed from > changes > > > in anomalous ocean temperature and circulation patterns, to increasing > > > extreme weather events and stronger/more frequent hurricanes, sea level > > > rise and storm surges affecting coastlines, and may involve drastic > > changes > > > and shifts in marine ecosystems. Ocean science communities are relying > on > > > data distributed through data centers such as the JPL's Physical > > > Oceanographic Data Active Archive Center (PO.DAAC) to conduct their > > > research. In typical investigations, oceanographers follow a > traditional > > > workflow for using datasets: search, evaluate, download, and apply > tools > > > and algorithms to look for trends. While this workflow has been working > > > very well historically for the oceanographic community, it cannot scale > > if > > > the research involves massive amount of data. NASA's Surface Water and > > > Ocean Topography (SWOT) mission, scheduled to launch in April of 2021, > is > > > expected to generate over 20PB data for a nominal 3-year mission. This > > will > > > challenge all existing NASA Earth Science data archival/distribution > > > paradigms. It will no longer be feasible for Earth scientists to > download > > > and analyze such volumes of data. SDAP was therefore developed > primarily > > as > > > a Web-service platform for big ocean data science at the PO.DAAC with > > open > > > source solutions used to enable fast analysis of oceanographic data. > SDAP > > > has been developed collaboratively between JPL, FSU, NCAR, and GMU and > is > > > rapidly maturing to become the generic platform for the next generation > > of > > > big science data solutions. The platform is an orchestration of several > > > previously funded NASA big ocean data solutions using cloud technology, > > > which include data analysis (NEXUS), anomaly detection (OceanXtremes), > > > matchup (DOMS), subsetting, discovery (MUDROD), and visualization > (VQSS). > > > SDAP will enable web-accessible, fast data analysis directly on huge > > > scientific data archives to minimize data movement and provide access, > > > including subset, only to the relevant data. > > > > > > = Science Data Analytics Platform Project Overview = > > > SDAP consists of several loosely coupled, independently functioning > > > sub-projects. The graphic below displays an overview of how these > > > sub-projects fuse together. N.B., although the graphic uses terminology > > > relating to OceanWorks, essentially the SDAP architecture is identical. > > > > > > {{attachment:sdap.png}} > > > > > > == OceanXtremes == > > > Oceanographic Data-Intensive Anomaly Detection and Analysis Portal. An > > > application that allows you to view imagery and perform analysis on sea > > > level rise data. > > > > > > '''Objective''' > > > Develop an anomaly detection system which identifies items, events or > > > observations which do not conform to an expected pattern. > > > * Mature and test domain-specific, multi-scale anomaly and feature > > > detection algorithms. > > > * Identify unexpected correlations between key measured variables. > > > > > > Demonstrate value of technologies in this service: > > > * Adapted Map-Reduce data mining. > > > * Algorithm profiling service. > > > * Shared discovery and exploration search tools. > > > * Automatic notification of events of interest. > > > > > > == NEXUS == > > > NEXUS is an emerging technology developed at JPL > > > * A Cloud-based/Cluster-based data platform that performs scalable > > > handling of observational parameters analysis designed to scale > > horizontally > > > * Leveraging high-performance indexed, temporal, and geospatial search > > > solution > > > * Breaks data products into small chunks and stores them in a > > Cloud-based > > > data store > > > > > > ''Data Volumes Exploding'' > > > * SWOT mission is coming > > > * File I/O is slow > > > > > > ''Scalable Store & Compute is Available'' > > > * NoSQL cluster databases > > > * Parallel compute, in-memory map-reduce > > > * Bring Compute to Highly-Accessible Data (using Hybrid Cloud) > > > > > > ''Pre-Chunk and Summarize Key Variables'' > > > * Easy statistics instantly (milliseconds) > > > * Harder statistics on-demand (in seconds) > > > * Visualize original data (layers) on a map quickly > > > > > > == DOMS == > > > The Distributed Oceanographic Match-Up Service > > > DOMS is designed to reconcile satellite and in situ datasets in support > > of > > > NASA's Earth Science mission. The service will provide a mechanism for > > > users to input a series of geospatial references for satellite > > observations > > > and receive the in situ observations that are matched to the satellite > > data > > > within a selectable temporal and spatial domain. DOMS includes several > > > characteristic in situ and satellite observation datasets - with an > > initial > > > focus on salinity, sea temperature, and winds. DOMS will be used by the > > > marine and satellite research communities to support a range of > > activities > > > and several use cases will be described. The service is designed to > > provide > > > a community-accessible tool that dynamically delivers matched data and > > > allows the scientist to only work with the subset of data where the > > matches > > > exist. > > > > > > == MUDROD == > > > Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to > > > Improve Data Discovery and Access > > > Data discovery accuracy is a challenging topic for both Earth science > and > > > other domains. It is especially true for scientific data sets that are > > not > > > as popular as Amazon or Google data. MUDROD is focused on mining > oceanic > > > knowledge from the PO.DAAC user log files to improve the end user data > > > discovery experience at PO.DAAC. There are three steps in the research: > > a) > > > the oceanographic semantics were extracted from three resources of > SWEET, > > > GCMD ontology, and the keywords used by end users for searching PO.DAAC > > > datasets, b) mining the linkage among different vocabularies based on > > user > > > data discvoery sessions, and c) build the linkage among vocabularies > > based > > > on a comprehensive approach by considering domain de facto standard, > > e.g., > > > SWEET and GCMD, and the knowledge mined from the log files. The > semantics > > > is used to improve data discovery for ranking results, navigating among > > > vocabularies, and recommending data based on user searchers. > > > > > > = Current Status = > > > All components of SDAP were originally designed and developed under > > grants > > > from the NASA-funded Advanced Information Systems and Technologies > (AIST) > > > program. The initiative to bring them the components together under the > > > SDAP umbrella was granted through an AIST-funded follow-on grant which > > will > > > run for another ~18 or so months. > > > Currently no projects have made official releases so outside of > community > > > building, this will be our primary Incubating goal. All SDAP source > code > > is > > > currently publicly available and licensed under the ALv2.0. > > > > > > = Meritocracy = > > > The current developers are familiar with meritocratic open source > > > development at Apache. The SDAP team consumes Apache products heavily > > with > > > members being part of several Apache user communities. SDAP itself has > > > critical dependencies upon Apache products. Lewis McGibbney (JPL > > employee), > > > a Member of the ASF and V.P. of Apache Any23, Gora PMC Nutch, Tika, > OODT, > > > OCW, etc., is championing the effort to bring SDAP into and through the > > > Apache Incubator and has been evangelizing the Apache Way to the > current > > > SDAP contributors such that the meritocratic process is well understood > > and > > > followed. Apache was chosen specifically because we want to encourage > > this > > > style of community development for the project and for it to sustain > SDAP > > > forward to become the generic platform for the next generation of big > > > science data solutions > > > > > > = Community = > > > The SDAP project is a fairly new effort and our community is not yet > > > fully/firmly established. Initial committers comprising the SDAP roster > > > have only recently fully come together as a unified team however there > > is a > > > large degree of synergy between constituent members at JPL, FSU, NCAR, > > and > > > GMU. Therefore, community building and publicity continues to be a > major > > > thrust. With the activity and exposure regularly attained by several > > > community members, we hope to grow the SDAP presence in and across > > several > > > (scientific) forums. The SDAP technology is generating interest within > > > communities such as the Earth Science Information Partnership (ESIP), > > > American Geophysical Union (AGU) and plethora or science meetings > around > > > the globe. This in effect, we hope, will further contribute towards the > > > possibility of SDAP being used across Government Agencies such as NASA, > > > NOAA, USGS, EPA, DOI, etc. as well as by researchers and students in > > > academic institutions around the globe. > > > During incubation, we will explicitly seek to increase our adoption, > with > > > SDAP already being featured on the agenda for several high profile > > globally > > > significant scientific conferences and meetings. > > > > > > = Core Developers = > > > The current set of core developers is relatively small, including > > > full-time and students from across JPL, FSU, NCAR, and GMU. Initial > > > community management and participation will be distributed across the > > > entire team, most of which have been involved with the constituent > > projects > > > for <2 years. > > > > > > = Alignment = > > > All SDAP code is licensed under Apache v2.0. > > > > > > = Known Risks = > > > > > > == Orphaned products == > > > There are currently no orphaned products. Each component of SDAP has > > > dedicated personnel leading and participating in its ongoing > development. > > > Additionally, there is substantial collaboration between projects > > > facilitated by regular project meetings which are specific the the > > initial > > > member entities and focused on advancing physical oceanographic > science. > > > > > > == Inexperience with Open Source == > > > JPL (in particular Lewis McGibbney) has been part of several efforts to > > > transition to and grow projects communities at Apache e.g. Apache OODT, > > > Apache Open Climate Workbench, Apache Joshua (Incubating), Apache > > SensSoft > > > (Incubating), Apache DRAT (Incubating). Most of the code developed > under > > > the SDAP umbrella was and is open source prior to the Incubator effort > so > > > we are well familiarized with the nuances of open source software. > > > > > > = Relationships with Other Apache Products = > > > SDAP has strong dependency upon a number of high profile and smaller > > > profile Apache products. Examples can be seen in the breakdown of > > External > > > Dependencies. As we continue to grow SDAP within the Incubator, we will > > > make efforts to share community stories, software advancements and > > possible > > > improvements in our use of our Apache dependencies back to those > project > > > communities. > > > > > > = Developers = > > > The SDAP project and hence developers is currently funded through a > NASA > > > AIST follow-on grant with funding secured for the next ~18 months. > There > > > are currently no 100% time dedicated developers, however, the same core > > > team that does work currently will continue to work on the project > > > throughout the next current funding period and after. There is > currently > > no > > > business strategy aligned with SDAP however it is perceived that > future, > > > yet unsecured funding may by directed to further feature advancement > and > > > project evangelism. > > > > > > = Documentation = > > > Documentation is currently available in a number of locations e.g. > Github > > > wiki, Github pages, etc. with each repository under the oceanworks-aist > > > Github Org maintaining documentation available through wiki’s attached > to > > > the repositories. Additionally, most of the SDAP sub-projects have been > > > extensively documented within plethora of formal academic publications > > > across several academic communities. It would be our intention, > certainly > > > atleast to unify the Github wiki ad Github pages documentation most > > likely > > > to make up the sdap.apache.org Website content. > > > > > > = Initial Source = > > > Current source resides in several locations Github: > > > * https://github.com/dataplumber/nexus (NEXUS, OceanXtremes, DOMS) > > > * https://github.com/dataplumber/edge (EDGE) > > > * https://github.com/aist-oceanworks/mudrod (MUDROD) > > > * https://bitbucket.org/coaps_mdc/doms/src (DOMS) > > > > > > = External Dependencies = > > > Each component of the Science Data Analytics Platform has its own > > > dependencies. Documentation will be available for integrating them. > > > > > > == MUDROD == > > > '''Core''' > > > com.google.code.gson gson 2.5 compile > > > jar false > > > org.jdom jdom 2.0.2 compile > > > jar false > > > org.elasticsearch elasticsearch 5.2.0 compile > > > jar false > > > org.elasticsearch elasticsearch-spark-20_2.11 5.2.0 compile > > > jar false > > > joda-time joda-time 2.9.4 compile > > > jar false > > > com.carrotsearch hppc 0.7.1 compile > > > jar false > > > org.apache.spark spark-core_2.11 2.1.0 compile > > > jar false > > > org.apache.spark spark-sql_2.11 2.1.0 compile > > > jar false > > > org.apache.spark spark-mllib_2.11 2.1.0 compile > > > jar false > > > org.scala-lang scala-library 2.11.8 compile > > > jar false > > > org.codehaus.jettison jettison 1.3.8 compile > > > jar false > > > commons-cli commons-cli 1.2 compile > > > jar false > > > net.sf.opencsv opencsv 2.3 compile > > > jar false > > > org.apache.jena jena-core 3.3.0 compile > > > jar false > > > junit junit 4.12 test > > > jar false > > > > > > '''Service''' > > > gov.nasa.jpl.mudrod mudrod-core 0.0.1-SNAPSHOT compile > > > jar false > > > javax.servlet javax.servlet-api 3.1.0 provided > > > jar false > > > com.google.code.gson gson 2.5 compile > > > jar false > > > > > > '''Web''' > > > * AngularJS - MIT License > > > * BootstrapJS - MIT License > > > * jQueryJS - MIT License > > > * Underscore JS - MIT License > > > > > > == DOMS == > > > * Apache Solr version 5.5.1http://lucene.apache.org/solr/ > > > * EDGE https://github.com/dataplumber/edge > > > * NetCDF4 http://unidata.github.io/netcdf4-python/ > > > * Python 3.5 (NOTE: only partial support for py2.7) > > > > > > Non stdlib Python dependencies: > > > * Jinja2==2.9.5 > > > * python-dateutil==2.6.0 > > > * cython==0.25.2 > > > * numpy==1.12.0 > > > * scipy==0.18.1 > > > * netCDF4==1.2.7 > > > * solrpy3 > > > * siphon==0.4.0 > > > * neo4j-driver==1.1.0 > > > * matplotlib==2.0.0 > > > * requests==2.13.0 > > > * shapely==1.5.17 > > > * flask==0.12 > > > * networkx==1.11 > > > * pyproj==1.9.5.1 > > > * blist==1.3.6 > > > > > > == NEXUS == > > > '''Analysis''' > > > * https://github.com/dataplumber/nexus/blob/master/ > > > analysis/package-list.txt > > > * https://github.com/dataplumber/nexus/blob/master/ > > > analysis/requirements.txt > > > > > > '''Client''' > > > * https://github.com/dataplumber/nexus/blob/master/ > > > client/requirements.txt > > > > > > '''Climatology''' > > > * matplotlib > > > * numpy > > > * netCDF4 > > > * pathos (https://pypi.python.org/pypi/pathos) > > > > > > '''Data-access''' > > > * https://github.com/dataplumber/nexus/blob/master/ > > > data-access/requirements.txt > > > > > > '''Nexus-ingest''' > > > ''Dataset-tiler'' > > > * https://github.com/dataplumber/nexus/tree/master/ > > > nexus-ingest/dataset-tiler/build/reports > > > > > > ''developer-box'' > > > * Just a collection of scripts/vagrant file used to stand up a > developer > > > instance of nexus ingestion. No dependencies to report > > > > > > ''Groovy-scripts'' > > > * Collection of Groovy scripts that can be used as part of data > > > ingestion. They only rely on the standard Groovy library and the > > > ‘nexus-messages’ project > > > > > > ''Nexus-messages'' > > > * https://github.com/dataplumber/nexus/tree/master/ > > > nexus-ingest/nexus-messages/build/reports > > > > > > ''nexus-sink'' > > > * https://github.com/dataplumber/nexus/tree/master/ > > > nexus-ingest/nexus-sink/build/reports > > > > > > ''nexus-xd-python-modules'' > > > * https://github.com/dataplumber/nexus/blob/master/ > > > nexus-ingest/nexus-xd-python-modules/package-list.txt > > > * https://github.com/dataplumber/nexus/blob/master/ > > > nexus-ingest/nexus-xd-python-modules/requirements.txt > > > > > > ''spring-xd-python'' > > > * only python standard libraries are used > > > > > > ''tcp-shell'' > > > * https://github.com/dataplumber/nexus/tree/master/ > > > nexus-ingest/tcp-shell/build/reports > > > > > > '''tools/deletebyquery''' > > > * https://github.com/dataplumber/nexus/blob/master/ > tools/deletebyquery/ > > > requirements.txt > > > > > > = Required Resources = > > > Mailing Lists > > > * priv...@sdap.incubator.apache.org > > > * d...@sdap.incubator.apache.org > > > * comm...@sdap.incubator.apache.org > > > > > > Git Repos > > > * https://git-wip-us.apache.org/repos/asf/incubator-nexus.git > > > * https://git-wip-us.apache.org/repos/asf/incubator-doms.git > > > * https://git-wip-us.apache.org/repos/asf/incubator-mudrod.git > > > > > > Issue Tracking > > > * JIRA Science Data Analytics Platform (SDAP) > > > > > > Continuous Integration > > > * Jenkins builds on https://builds.apache.org/ > > > > > > Web > > > * http://sdap.incubator.apache.org/ > > > * wiki at http://cwiki.apache.org > > > > > > = Initial Committers = > > > The following is a list of the planned initial Apache committers (the > > > active subset of the committers for the current repository on Github). > > > * Lewis John McGibbney (lewi...@apache.org) > > > * Vardis M. Tsontos (vardis.m.tson...@jpl.nasa.gov) > > > * Joseph C. Jacob (joseph.c.ja...@jpl.nasa.gov) > > > * Ed Armstrong (edward.m.armstr...@jpl.nasa.gov) > > > * Frank Greguska (gregu...@jpl.nasa.gov) > > > * Brian Wilson (brian.wil...@jpl.nasa.gov) > > > * Chaowe Phil Yang (cya...@gmu.edu) > > > * Yongyao Jiang (yjia...@gmu.edu) > > > * Yun Li (yl...@gmu.edu) > > > * Shawn R. Smith (sm...@coaps.fsu.edu) > > > * Jocelyn Elya (je...@coaps.fsu.edu) > > > * Mark Bourassa (boura...@coaps.fsu.edu) > > > * Thomas Cram (tc...@ucar.edu) > > > * Thomas Huang (thomas.hu...@jpl.nasa.gov) > > > * Steven Worley (wor...@ucar.edu) > > > * Zaihua Ji (z...@ucar.edu) > > > > > > = Affiliations = > > > NASA JPL > > > * Lewis John McGibbney (lewi...@apache.org) > > > * Vardis M. Tsontos (vardis.m.tson...@jpl.nasa.gov) > > > * Joseph C. Jacob (joseph.c.ja...@jpl.nasa.gov) > > > * Ed Armstrong (edward.m.armstr...@jpl.nasa.gov) > > > * Frank Greguska (gregu...@jpl.nasa.gov) > > > * Thomas Huang (thomas.hu...@jpl.nasa.gov) > > > * Brian Wilson (brian.wil...@jpl.nasa.gov) > > > > > > George Mason University > > > * Chaowe Phil Yang (cya...@gmu.edu) > > > * Yongyao Jiang (yjia...@gmu.edu) > > > * Yun Li (yl...@gmu.edu) > > > > > > Center for Ocean-Atmospheric Prediction Studies, Florida State > University > > > * Shawn R. Smith (sm...@coaps.fsu.edu) > > > * Jocelyn Elya (je...@coaps.fsu.edu) > > > * Mark Bourassa (boura...@coaps.fsu.edu) > > > > > > Computational Information Systems Laboratory (CISL) / National Center > for > > > Atmospheric Research (NCAR) > > > * Thomas Cram (tc...@ucar.edu) > > > * Zaihua Ji (z...@ucar.edu) > > > * Steven Worley (wor...@ucar.edu) > > > > > > = Sponsors = > > > > > > = Champion = > > > * Lewis McGibbney (NASA/JPL) > > > > > > = Nominated Mentors = > > > * TBD > > > * TBD > > > * TBD > > > > > > = Sponsoring Entity = > > > The Apache Incubator > > > > > > > > > -- > > > http://home.apache.org/~lewismc/ > > > @hectorMcSpector > > > http://www.linkedin.com/in/lmcgibbney > > > > > > > > > > > -- > > http://home.apache.org/~lewismc/ > > @hectorMcSpector > > http://www.linkedin.com/in/lmcgibbney > > >