Here is my +1 Thanks, Madhawa
Madhawa On Tue, Oct 17, 2017 at 4:04 PM, lewis john mcgibbney <lewi...@apache.org> wrote: > Hi Folks, > Having secured a mentorship team consisting of the following IPMC Members, > I am happy to open a formal VOTE thread on accepting the Science Data > Analytics Platform (SDAP) into Apache Incubator. > > - Lewis John McGibbney (lewi...@apache.org) > - Raphael Bircher (bircher at apace dot org) > - Suneel Marthi (smarthi at apache dot org) > > Thank you to both Raphael and Suneel for coming forward. :) > The VOTE will be open for at least 72 hours. > > [ ] +1 Accept Science Data Analytics Platform (SDAP) into Apache Incubator > [ ] +/-0 ... just because > [ ] -1 Do NOT Accept Science Data Analytics Platform (SDAP) into Apache > Incubator... because > > Thanks in advance to all participants. > Lewis > > P.S. Here is a binding +1 from me > > On Wed, Oct 11, 2017 at 11:22 AM, lewis john mcgibbney <lewi...@apache.org > > > wrote: > > > Hi Folks, > > I would like to open a DISCUSS thread on the topic of accepting the > > Science Data Analytics Platform (SDAP) <https://wiki.apache.org/ > > incubator/SDAPProposal> Project into the Incubator. > > I am CC'ing Thomas Huang from NASA JPL who I have been working with to > > build community around a kick-ass set of software projects under the SDAP > > umbrella. > > At this stage we would very much appreciate critical feedback from > general@ > > community. We are also open to mentors who may have an interest in the > > project proposal. > > The proposal is pasted below. > > Thanks in advance, > > Lewis > > > > = Abstract = > > The Science Data Analytics Platform (SDAP) establishes an integrated data > > analytic center for Big Science problems. It focuses on technology > > integration, advancement and maturity. > > > > = Proposal = > > SDAP currently represents a collaboration between NASA Jet Propulsion > > Laboratory (JPL), Florida State University (FSU), the National Center for > > Atmospheric Research (NCAR), and George Mason University (GMU). SDAP > brings > > together a number of big data technologies including a NASA funded > > OceanXtremes (Anomaly detection and ocean science), NEXUS (Deep data > > analytic platform), DOMS (Distributed in-situ to satellite matchup), > MUDROD > > (Search relevancy and discovery) and VQSS (Virtualized Quality Screening > > Service) under a single umbrella. Within the original Incubator proposal, > > VQSS will not be included however it is anticipated that a future source > > code donation will cover VQSS. > > > > = Background and Rationale = > > SDAP is a technology software solution currently geared to better enable > > scientists involved in advancing the study of the Earth's physical > > oceanography. With increasing global temperature, warming of the ocean, > and > > melting ice sheets and glaciers, the impacts can be observed from changes > > in anomalous ocean temperature and circulation patterns, to increasing > > extreme weather events and stronger/more frequent hurricanes, sea level > > rise and storm surges affecting coastlines, and may involve drastic > changes > > and shifts in marine ecosystems. Ocean science communities are relying on > > data distributed through data centers such as the JPL's Physical > > Oceanographic Data Active Archive Center (PO.DAAC) to conduct their > > research. In typical investigations, oceanographers follow a traditional > > workflow for using datasets: search, evaluate, download, and apply tools > > and algorithms to look for trends. While this workflow has been working > > very well historically for the oceanographic community, it cannot scale > if > > the research involves massive amount of data. NASA's Surface Water and > > Ocean Topography (SWOT) mission, scheduled to launch in April of 2021, is > > expected to generate over 20PB data for a nominal 3-year mission. This > will > > challenge all existing NASA Earth Science data archival/distribution > > paradigms. It will no longer be feasible for Earth scientists to download > > and analyze such volumes of data. SDAP was therefore developed primarily > as > > a Web-service platform for big ocean data science at the PO.DAAC with > open > > source solutions used to enable fast analysis of oceanographic data. SDAP > > has been developed collaboratively between JPL, FSU, NCAR, and GMU and is > > rapidly maturing to become the generic platform for the next generation > of > > big science data solutions. The platform is an orchestration of several > > previously funded NASA big ocean data solutions using cloud technology, > > which include data analysis (NEXUS), anomaly detection (OceanXtremes), > > matchup (DOMS), subsetting, discovery (MUDROD), and visualization (VQSS). > > SDAP will enable web-accessible, fast data analysis directly on huge > > scientific data archives to minimize data movement and provide access, > > including subset, only to the relevant data. > > > > = Science Data Analytics Platform Project Overview = > > SDAP consists of several loosely coupled, independently functioning > > sub-projects. The graphic below displays an overview of how these > > sub-projects fuse together. N.B., although the graphic uses terminology > > relating to OceanWorks, essentially the SDAP architecture is identical. > > > > {{attachment:sdap.png}} > > > > == OceanXtremes == > > Oceanographic Data-Intensive Anomaly Detection and Analysis Portal. An > > application that allows you to view imagery and perform analysis on sea > > level rise data. > > > > '''Objective''' > > Develop an anomaly detection system which identifies items, events or > > observations which do not conform to an expected pattern. > > * Mature and test domain-specific, multi-scale anomaly and feature > > detection algorithms. > > * Identify unexpected correlations between key measured variables. > > > > Demonstrate value of technologies in this service: > > * Adapted Map-Reduce data mining. > > * Algorithm profiling service. > > * Shared discovery and exploration search tools. > > * Automatic notification of events of interest. > > > > == NEXUS == > > NEXUS is an emerging technology developed at JPL > > * A Cloud-based/Cluster-based data platform that performs scalable > > handling of observational parameters analysis designed to scale > horizontally > > * Leveraging high-performance indexed, temporal, and geospatial search > > solution > > * Breaks data products into small chunks and stores them in a > Cloud-based > > data store > > > > ''Data Volumes Exploding'' > > * SWOT mission is coming > > * File I/O is slow > > > > ''Scalable Store & Compute is Available'' > > * NoSQL cluster databases > > * Parallel compute, in-memory map-reduce > > * Bring Compute to Highly-Accessible Data (using Hybrid Cloud) > > > > ''Pre-Chunk and Summarize Key Variables'' > > * Easy statistics instantly (milliseconds) > > * Harder statistics on-demand (in seconds) > > * Visualize original data (layers) on a map quickly > > > > == DOMS == > > The Distributed Oceanographic Match-Up Service > > DOMS is designed to reconcile satellite and in situ datasets in support > of > > NASA's Earth Science mission. The service will provide a mechanism for > > users to input a series of geospatial references for satellite > observations > > and receive the in situ observations that are matched to the satellite > data > > within a selectable temporal and spatial domain. DOMS includes several > > characteristic in situ and satellite observation datasets - with an > initial > > focus on salinity, sea temperature, and winds. DOMS will be used by the > > marine and satellite research communities to support a range of > activities > > and several use cases will be described. The service is designed to > provide > > a community-accessible tool that dynamically delivers matched data and > > allows the scientist to only work with the subset of data where the > matches > > exist. > > > > == MUDROD == > > Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to > > Improve Data Discovery and Access > > Data discovery accuracy is a challenging topic for both Earth science and > > other domains. It is especially true for scientific data sets that are > not > > as popular as Amazon or Google data. MUDROD is focused on mining oceanic > > knowledge from the PO.DAAC user log files to improve the end user data > > discovery experience at PO.DAAC. There are three steps in the research: > a) > > the oceanographic semantics were extracted from three resources of SWEET, > > GCMD ontology, and the keywords used by end users for searching PO.DAAC > > datasets, b) mining the linkage among different vocabularies based on > user > > data discvoery sessions, and c) build the linkage among vocabularies > based > > on a comprehensive approach by considering domain de facto standard, > e.g., > > SWEET and GCMD, and the knowledge mined from the log files. The semantics > > is used to improve data discovery for ranking results, navigating among > > vocabularies, and recommending data based on user searchers. > > > > = Current Status = > > All components of SDAP were originally designed and developed under > grants > > from the NASA-funded Advanced Information Systems and Technologies (AIST) > > program. The initiative to bring them the components together under the > > SDAP umbrella was granted through an AIST-funded follow-on grant which > will > > run for another ~18 or so months. > > Currently no projects have made official releases so outside of community > > building, this will be our primary Incubating goal. All SDAP source code > is > > currently publicly available and licensed under the ALv2.0. > > > > = Meritocracy = > > The current developers are familiar with meritocratic open source > > development at Apache. The SDAP team consumes Apache products heavily > with > > members being part of several Apache user communities. SDAP itself has > > critical dependencies upon Apache products. Lewis McGibbney (JPL > employee), > > a Member of the ASF and V.P. of Apache Any23, Gora PMC Nutch, Tika, OODT, > > OCW, etc., is championing the effort to bring SDAP into and through the > > Apache Incubator and has been evangelizing the Apache Way to the current > > SDAP contributors such that the meritocratic process is well understood > and > > followed. Apache was chosen specifically because we want to encourage > this > > style of community development for the project and for it to sustain SDAP > > forward to become the generic platform for the next generation of big > > science data solutions > > > > = Community = > > The SDAP project is a fairly new effort and our community is not yet > > fully/firmly established. Initial committers comprising the SDAP roster > > have only recently fully come together as a unified team however there > is a > > large degree of synergy between constituent members at JPL, FSU, NCAR, > and > > GMU. Therefore, community building and publicity continues to be a major > > thrust. With the activity and exposure regularly attained by several > > community members, we hope to grow the SDAP presence in and across > several > > (scientific) forums. The SDAP technology is generating interest within > > communities such as the Earth Science Information Partnership (ESIP), > > American Geophysical Union (AGU) and plethora or science meetings around > > the globe. This in effect, we hope, will further contribute towards the > > possibility of SDAP being used across Government Agencies such as NASA, > > NOAA, USGS, EPA, DOI, etc. as well as by researchers and students in > > academic institutions around the globe. > > During incubation, we will explicitly seek to increase our adoption, with > > SDAP already being featured on the agenda for several high profile > globally > > significant scientific conferences and meetings. > > > > = Core Developers = > > The current set of core developers is relatively small, including > > full-time and students from across JPL, FSU, NCAR, and GMU. Initial > > community management and participation will be distributed across the > > entire team, most of which have been involved with the constituent > projects > > for <2 years. > > > > = Alignment = > > All SDAP code is licensed under Apache v2.0. > > > > = Known Risks = > > > > == Orphaned products == > > There are currently no orphaned products. Each component of SDAP has > > dedicated personnel leading and participating in its ongoing development. > > Additionally, there is substantial collaboration between projects > > facilitated by regular project meetings which are specific the the > initial > > member entities and focused on advancing physical oceanographic science. > > > > == Inexperience with Open Source == > > JPL (in particular Lewis McGibbney) has been part of several efforts to > > transition to and grow projects communities at Apache e.g. Apache OODT, > > Apache Open Climate Workbench, Apache Joshua (Incubating), Apache > SensSoft > > (Incubating), Apache DRAT (Incubating). Most of the code developed under > > the SDAP umbrella was and is open source prior to the Incubator effort so > > we are well familiarized with the nuances of open source software. > > > > = Relationships with Other Apache Products = > > SDAP has strong dependency upon a number of high profile and smaller > > profile Apache products. Examples can be seen in the breakdown of > External > > Dependencies. As we continue to grow SDAP within the Incubator, we will > > make efforts to share community stories, software advancements and > possible > > improvements in our use of our Apache dependencies back to those project > > communities. > > > > = Developers = > > The SDAP project and hence developers is currently funded through a NASA > > AIST follow-on grant with funding secured for the next ~18 months. There > > are currently no 100% time dedicated developers, however, the same core > > team that does work currently will continue to work on the project > > throughout the next current funding period and after. There is currently > no > > business strategy aligned with SDAP however it is perceived that future, > > yet unsecured funding may by directed to further feature advancement and > > project evangelism. > > > > = Documentation = > > Documentation is currently available in a number of locations e.g. Github > > wiki, Github pages, etc. with each repository under the oceanworks-aist > > Github Org maintaining documentation available through wiki’s attached to > > the repositories. Additionally, most of the SDAP sub-projects have been > > extensively documented within plethora of formal academic publications > > across several academic communities. It would be our intention, certainly > > atleast to unify the Github wiki ad Github pages documentation most > likely > > to make up the sdap.apache.org Website content. > > > > = Initial Source = > > Current source resides in several locations Github: > > * https://github.com/dataplumber/nexus (NEXUS, OceanXtremes, DOMS) > > * https://github.com/dataplumber/edge (EDGE) > > * https://github.com/aist-oceanworks/mudrod (MUDROD) > > * https://bitbucket.org/coaps_mdc/doms/src (DOMS) > > > > = External Dependencies = > > Each component of the Science Data Analytics Platform has its own > > dependencies. Documentation will be available for integrating them. > > > > == MUDROD == > > '''Core''' > > com.google.code.gson gson 2.5 compile > > jar false > > org.jdom jdom 2.0.2 compile > > jar false > > org.elasticsearch elasticsearch 5.2.0 compile > > jar false > > org.elasticsearch elasticsearch-spark-20_2.11 5.2.0 compile > > jar false > > joda-time joda-time 2.9.4 compile > > jar false > > com.carrotsearch hppc 0.7.1 compile > > jar false > > org.apache.spark spark-core_2.11 2.1.0 compile > > jar false > > org.apache.spark spark-sql_2.11 2.1.0 compile > > jar false > > org.apache.spark spark-mllib_2.11 2.1.0 compile > > jar false > > org.scala-lang scala-library 2.11.8 compile > > jar false > > org.codehaus.jettison jettison 1.3.8 compile > > jar false > > commons-cli commons-cli 1.2 compile > > jar false > > net.sf.opencsv opencsv 2.3 compile > > jar false > > org.apache.jena jena-core 3.3.0 compile > > jar false > > junit junit 4.12 test > > jar false > > > > '''Service''' > > gov.nasa.jpl.mudrod mudrod-core 0.0.1-SNAPSHOT compile > > jar false > > javax.servlet javax.servlet-api 3.1.0 provided > > jar false > > com.google.code.gson gson 2.5 compile > > jar false > > > > '''Web''' > > * AngularJS - MIT License > > * BootstrapJS - MIT License > > * jQueryJS - MIT License > > * Underscore JS - MIT License > > > > == DOMS == > > * Apache Solr version 5.5.1http://lucene.apache.org/solr/ > > * EDGE https://github.com/dataplumber/edge > > * NetCDF4 http://unidata.github.io/netcdf4-python/ > > * Python 3.5 (NOTE: only partial support for py2.7) > > > > Non stdlib Python dependencies: > > * Jinja2==2.9.5 > > * python-dateutil==2.6.0 > > * cython==0.25.2 > > * numpy==1.12.0 > > * scipy==0.18.1 > > * netCDF4==1.2.7 > > * solrpy3 > > * siphon==0.4.0 > > * neo4j-driver==1.1.0 > > * matplotlib==2.0.0 > > * requests==2.13.0 > > * shapely==1.5.17 > > * flask==0.12 > > * networkx==1.11 > > * pyproj==1.9.5.1 > > * blist==1.3.6 > > > > == NEXUS == > > '''Analysis''' > > * https://github.com/dataplumber/nexus/blob/master/ > > analysis/package-list.txt > > * https://github.com/dataplumber/nexus/blob/master/ > > analysis/requirements.txt > > > > '''Client''' > > * https://github.com/dataplumber/nexus/blob/master/ > > client/requirements.txt > > > > '''Climatology''' > > * matplotlib > > * numpy > > * netCDF4 > > * pathos (https://pypi.python.org/pypi/pathos) > > > > '''Data-access''' > > * https://github.com/dataplumber/nexus/blob/master/ > > data-access/requirements.txt > > > > '''Nexus-ingest''' > > ''Dataset-tiler'' > > * https://github.com/dataplumber/nexus/tree/master/ > > nexus-ingest/dataset-tiler/build/reports > > > > ''developer-box'' > > * Just a collection of scripts/vagrant file used to stand up a developer > > instance of nexus ingestion. No dependencies to report > > > > ''Groovy-scripts'' > > * Collection of Groovy scripts that can be used as part of data > > ingestion. They only rely on the standard Groovy library and the > > ‘nexus-messages’ project > > > > ''Nexus-messages'' > > * https://github.com/dataplumber/nexus/tree/master/ > > nexus-ingest/nexus-messages/build/reports > > > > ''nexus-sink'' > > * https://github.com/dataplumber/nexus/tree/master/ > > nexus-ingest/nexus-sink/build/reports > > > > ''nexus-xd-python-modules'' > > * https://github.com/dataplumber/nexus/blob/master/ > > nexus-ingest/nexus-xd-python-modules/package-list.txt > > * https://github.com/dataplumber/nexus/blob/master/ > > nexus-ingest/nexus-xd-python-modules/requirements.txt > > > > ''spring-xd-python'' > > * only python standard libraries are used > > > > ''tcp-shell'' > > * https://github.com/dataplumber/nexus/tree/master/ > > nexus-ingest/tcp-shell/build/reports > > > > '''tools/deletebyquery''' > > * https://github.com/dataplumber/nexus/blob/master/tools/deletebyquery/ > > requirements.txt > > > > = Required Resources = > > Mailing Lists > > * priv...@sdap.incubator.apache.org > > * d...@sdap.incubator.apache.org > > * comm...@sdap.incubator.apache.org > > > > Git Repos > > * https://git-wip-us.apache.org/repos/asf/incubator-nexus.git > > * https://git-wip-us.apache.org/repos/asf/incubator-doms.git > > * https://git-wip-us.apache.org/repos/asf/incubator-mudrod.git > > > > Issue Tracking > > * JIRA Science Data Analytics Platform (SDAP) > > > > Continuous Integration > > * Jenkins builds on https://builds.apache.org/ > > > > Web > > * http://sdap.incubator.apache.org/ > > * wiki at http://cwiki.apache.org > > > > = Initial Committers = > > The following is a list of the planned initial Apache committers (the > > active subset of the committers for the current repository on Github). > > * Lewis John McGibbney (lewi...@apache.org) > > * Vardis M. Tsontos (vardis.m.tson...@jpl.nasa.gov) > > * Joseph C. Jacob (joseph.c.ja...@jpl.nasa.gov) > > * Ed Armstrong (edward.m.armstr...@jpl.nasa.gov) > > * Frank Greguska (gregu...@jpl.nasa.gov) > > * Brian Wilson (brian.wil...@jpl.nasa.gov) > > * Chaowe Phil Yang (cya...@gmu.edu) > > * Yongyao Jiang (yjia...@gmu.edu) > > * Yun Li (yl...@gmu.edu) > > * Shawn R. Smith (sm...@coaps.fsu.edu) > > * Jocelyn Elya (je...@coaps.fsu.edu) > > * Mark Bourassa (boura...@coaps.fsu.edu) > > * Thomas Cram (tc...@ucar.edu) > > * Thomas Huang (thomas.hu...@jpl.nasa.gov) > > * Steven Worley (wor...@ucar.edu) > > * Zaihua Ji (z...@ucar.edu) > > > > = Affiliations = > > NASA JPL > > * Lewis John McGibbney (lewi...@apache.org) > > * Vardis M. Tsontos (vardis.m.tson...@jpl.nasa.gov) > > * Joseph C. Jacob (joseph.c.ja...@jpl.nasa.gov) > > * Ed Armstrong (edward.m.armstr...@jpl.nasa.gov) > > * Frank Greguska (gregu...@jpl.nasa.gov) > > * Thomas Huang (thomas.hu...@jpl.nasa.gov) > > * Brian Wilson (brian.wil...@jpl.nasa.gov) > > > > George Mason University > > * Chaowe Phil Yang (cya...@gmu.edu) > > * Yongyao Jiang (yjia...@gmu.edu) > > * Yun Li (yl...@gmu.edu) > > > > Center for Ocean-Atmospheric Prediction Studies, Florida State University > > * Shawn R. Smith (sm...@coaps.fsu.edu) > > * Jocelyn Elya (je...@coaps.fsu.edu) > > * Mark Bourassa (boura...@coaps.fsu.edu) > > > > Computational Information Systems Laboratory (CISL) / National Center for > > Atmospheric Research (NCAR) > > * Thomas Cram (tc...@ucar.edu) > > * Zaihua Ji (z...@ucar.edu) > > * Steven Worley (wor...@ucar.edu) > > > > = Sponsors = > > > > = Champion = > > * Lewis McGibbney (NASA/JPL) > > > > = Nominated Mentors = > > * TBD > > * TBD > > * TBD > > > > = Sponsoring Entity = > > The Apache Incubator > > > > > > -- > > http://home.apache.org/~lewismc/ > > @hectorMcSpector > > http://www.linkedin.com/in/lmcgibbney > > > > > > -- > http://home.apache.org/~lewismc/ > @hectorMcSpector > http://www.linkedin.com/in/lmcgibbney >