+1 binding from me…thanks and good luck Cheers, Chris
On 10/17/17, 2:04 PM, "lewis john mcgibbney" <lewi...@apache.org> wrote: Hi Folks, Having secured a mentorship team consisting of the following IPMC Members, I am happy to open a formal VOTE thread on accepting the Science Data Analytics Platform (SDAP) into Apache Incubator. - Lewis John McGibbney (lewi...@apache.org) - Raphael Bircher (bircher at apace dot org) - Suneel Marthi (smarthi at apache dot org) Thank you to both Raphael and Suneel for coming forward. :) The VOTE will be open for at least 72 hours. [ ] +1 Accept Science Data Analytics Platform (SDAP) into Apache Incubator [ ] +/-0 ... just because [ ] -1 Do NOT Accept Science Data Analytics Platform (SDAP) into Apache Incubator... because Thanks in advance to all participants. Lewis P.S. Here is a binding +1 from me On Wed, Oct 11, 2017 at 11:22 AM, lewis john mcgibbney <lewi...@apache.org> wrote: > Hi Folks, > I would like to open a DISCUSS thread on the topic of accepting the > Science Data Analytics Platform (SDAP) <https://wiki.apache.org/ > incubator/SDAPProposal> Project into the Incubator. > I am CC'ing Thomas Huang from NASA JPL who I have been working with to > build community around a kick-ass set of software projects under the SDAP > umbrella. > At this stage we would very much appreciate critical feedback from general@ > community. We are also open to mentors who may have an interest in the > project proposal. > The proposal is pasted below. > Thanks in advance, > Lewis > > = Abstract = > The Science Data Analytics Platform (SDAP) establishes an integrated data > analytic center for Big Science problems. It focuses on technology > integration, advancement and maturity. > > = Proposal = > SDAP currently represents a collaboration between NASA Jet Propulsion > Laboratory (JPL), Florida State University (FSU), the National Center for > Atmospheric Research (NCAR), and George Mason University (GMU). SDAP brings > together a number of big data technologies including a NASA funded > OceanXtremes (Anomaly detection and ocean science), NEXUS (Deep data > analytic platform), DOMS (Distributed in-situ to satellite matchup), MUDROD > (Search relevancy and discovery) and VQSS (Virtualized Quality Screening > Service) under a single umbrella. Within the original Incubator proposal, > VQSS will not be included however it is anticipated that a future source > code donation will cover VQSS. > > = Background and Rationale = > SDAP is a technology software solution currently geared to better enable > scientists involved in advancing the study of the Earth's physical > oceanography. With increasing global temperature, warming of the ocean, and > melting ice sheets and glaciers, the impacts can be observed from changes > in anomalous ocean temperature and circulation patterns, to increasing > extreme weather events and stronger/more frequent hurricanes, sea level > rise and storm surges affecting coastlines, and may involve drastic changes > and shifts in marine ecosystems. Ocean science communities are relying on > data distributed through data centers such as the JPL's Physical > Oceanographic Data Active Archive Center (PO.DAAC) to conduct their > research. In typical investigations, oceanographers follow a traditional > workflow for using datasets: search, evaluate, download, and apply tools > and algorithms to look for trends. While this workflow has been working > very well historically for the oceanographic community, it cannot scale if > the research involves massive amount of data. NASA's Surface Water and > Ocean Topography (SWOT) mission, scheduled to launch in April of 2021, is > expected to generate over 20PB data for a nominal 3-year mission. This will > challenge all existing NASA Earth Science data archival/distribution > paradigms. It will no longer be feasible for Earth scientists to download > and analyze such volumes of data. SDAP was therefore developed primarily as > a Web-service platform for big ocean data science at the PO.DAAC with open > source solutions used to enable fast analysis of oceanographic data. SDAP > has been developed collaboratively between JPL, FSU, NCAR, and GMU and is > rapidly maturing to become the generic platform for the next generation of > big science data solutions. The platform is an orchestration of several > previously funded NASA big ocean data solutions using cloud technology, > which include data analysis (NEXUS), anomaly detection (OceanXtremes), > matchup (DOMS), subsetting, discovery (MUDROD), and visualization (VQSS). > SDAP will enable web-accessible, fast data analysis directly on huge > scientific data archives to minimize data movement and provide access, > including subset, only to the relevant data. > > = Science Data Analytics Platform Project Overview = > SDAP consists of several loosely coupled, independently functioning > sub-projects. The graphic below displays an overview of how these > sub-projects fuse together. N.B., although the graphic uses terminology > relating to OceanWorks, essentially the SDAP architecture is identical. > > {{attachment:sdap.png}} > > == OceanXtremes == > Oceanographic Data-Intensive Anomaly Detection and Analysis Portal. An > application that allows you to view imagery and perform analysis on sea > level rise data. > > '''Objective''' > Develop an anomaly detection system which identifies items, events or > observations which do not conform to an expected pattern. > * Mature and test domain-specific, multi-scale anomaly and feature > detection algorithms. > * Identify unexpected correlations between key measured variables. > > Demonstrate value of technologies in this service: > * Adapted Map-Reduce data mining. > * Algorithm profiling service. > * Shared discovery and exploration search tools. > * Automatic notification of events of interest. > > == NEXUS == > NEXUS is an emerging technology developed at JPL > * A Cloud-based/Cluster-based data platform that performs scalable > handling of observational parameters analysis designed to scale horizontally > * Leveraging high-performance indexed, temporal, and geospatial search > solution > * Breaks data products into small chunks and stores them in a Cloud-based > data store > > ''Data Volumes Exploding'' > * SWOT mission is coming > * File I/O is slow > > ''Scalable Store & Compute is Available'' > * NoSQL cluster databases > * Parallel compute, in-memory map-reduce > * Bring Compute to Highly-Accessible Data (using Hybrid Cloud) > > ''Pre-Chunk and Summarize Key Variables'' > * Easy statistics instantly (milliseconds) > * Harder statistics on-demand (in seconds) > * Visualize original data (layers) on a map quickly > > == DOMS == > The Distributed Oceanographic Match-Up Service > DOMS is designed to reconcile satellite and in situ datasets in support of > NASA's Earth Science mission. The service will provide a mechanism for > users to input a series of geospatial references for satellite observations > and receive the in situ observations that are matched to the satellite data > within a selectable temporal and spatial domain. DOMS includes several > characteristic in situ and satellite observation datasets - with an initial > focus on salinity, sea temperature, and winds. DOMS will be used by the > marine and satellite research communities to support a range of activities > and several use cases will be described. The service is designed to provide > a community-accessible tool that dynamically delivers matched data and > allows the scientist to only work with the subset of data where the matches > exist. > > == MUDROD == > Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to > Improve Data Discovery and Access > Data discovery accuracy is a challenging topic for both Earth science and > other domains. It is especially true for scientific data sets that are not > as popular as Amazon or Google data. MUDROD is focused on mining oceanic > knowledge from the PO.DAAC user log files to improve the end user data > discovery experience at PO.DAAC. There are three steps in the research: a) > the oceanographic semantics were extracted from three resources of SWEET, > GCMD ontology, and the keywords used by end users for searching PO.DAAC > datasets, b) mining the linkage among different vocabularies based on user > data discvoery sessions, and c) build the linkage among vocabularies based > on a comprehensive approach by considering domain de facto standard, e.g., > SWEET and GCMD, and the knowledge mined from the log files. The semantics > is used to improve data discovery for ranking results, navigating among > vocabularies, and recommending data based on user searchers. > > = Current Status = > All components of SDAP were originally designed and developed under grants > from the NASA-funded Advanced Information Systems and Technologies (AIST) > program. The initiative to bring them the components together under the > SDAP umbrella was granted through an AIST-funded follow-on grant which will > run for another ~18 or so months. > Currently no projects have made official releases so outside of community > building, this will be our primary Incubating goal. All SDAP source code is > currently publicly available and licensed under the ALv2.0. > > = Meritocracy = > The current developers are familiar with meritocratic open source > development at Apache. The SDAP team consumes Apache products heavily with > members being part of several Apache user communities. SDAP itself has > critical dependencies upon Apache products. Lewis McGibbney (JPL employee), > a Member of the ASF and V.P. of Apache Any23, Gora PMC Nutch, Tika, OODT, > OCW, etc., is championing the effort to bring SDAP into and through the > Apache Incubator and has been evangelizing the Apache Way to the current > SDAP contributors such that the meritocratic process is well understood and > followed. Apache was chosen specifically because we want to encourage this > style of community development for the project and for it to sustain SDAP > forward to become the generic platform for the next generation of big > science data solutions > > = Community = > The SDAP project is a fairly new effort and our community is not yet > fully/firmly established. Initial committers comprising the SDAP roster > have only recently fully come together as a unified team however there is a > large degree of synergy between constituent members at JPL, FSU, NCAR, and > GMU. Therefore, community building and publicity continues to be a major > thrust. With the activity and exposure regularly attained by several > community members, we hope to grow the SDAP presence in and across several > (scientific) forums. The SDAP technology is generating interest within > communities such as the Earth Science Information Partnership (ESIP), > American Geophysical Union (AGU) and plethora or science meetings around > the globe. This in effect, we hope, will further contribute towards the > possibility of SDAP being used across Government Agencies such as NASA, > NOAA, USGS, EPA, DOI, etc. as well as by researchers and students in > academic institutions around the globe. > During incubation, we will explicitly seek to increase our adoption, with > SDAP already being featured on the agenda for several high profile globally > significant scientific conferences and meetings. > > = Core Developers = > The current set of core developers is relatively small, including > full-time and students from across JPL, FSU, NCAR, and GMU. Initial > community management and participation will be distributed across the > entire team, most of which have been involved with the constituent projects > for <2 years. > > = Alignment = > All SDAP code is licensed under Apache v2.0. > > = Known Risks = > > == Orphaned products == > There are currently no orphaned products. Each component of SDAP has > dedicated personnel leading and participating in its ongoing development. > Additionally, there is substantial collaboration between projects > facilitated by regular project meetings which are specific the the initial > member entities and focused on advancing physical oceanographic science. > > == Inexperience with Open Source == > JPL (in particular Lewis McGibbney) has been part of several efforts to > transition to and grow projects communities at Apache e.g. Apache OODT, > Apache Open Climate Workbench, Apache Joshua (Incubating), Apache SensSoft > (Incubating), Apache DRAT (Incubating). Most of the code developed under > the SDAP umbrella was and is open source prior to the Incubator effort so > we are well familiarized with the nuances of open source software. > > = Relationships with Other Apache Products = > SDAP has strong dependency upon a number of high profile and smaller > profile Apache products. Examples can be seen in the breakdown of External > Dependencies. As we continue to grow SDAP within the Incubator, we will > make efforts to share community stories, software advancements and possible > improvements in our use of our Apache dependencies back to those project > communities. > > = Developers = > The SDAP project and hence developers is currently funded through a NASA > AIST follow-on grant with funding secured for the next ~18 months. There > are currently no 100% time dedicated developers, however, the same core > team that does work currently will continue to work on the project > throughout the next current funding period and after. There is currently no > business strategy aligned with SDAP however it is perceived that future, > yet unsecured funding may by directed to further feature advancement and > project evangelism. > > = Documentation = > Documentation is currently available in a number of locations e.g. Github > wiki, Github pages, etc. with each repository under the oceanworks-aist > Github Org maintaining documentation available through wiki’s attached to > the repositories. Additionally, most of the SDAP sub-projects have been > extensively documented within plethora of formal academic publications > across several academic communities. It would be our intention, certainly > atleast to unify the Github wiki ad Github pages documentation most likely > to make up the sdap.apache.org Website content. > > = Initial Source = > Current source resides in several locations Github: > * https://github.com/dataplumber/nexus (NEXUS, OceanXtremes, DOMS) > * https://github.com/dataplumber/edge (EDGE) > * https://github.com/aist-oceanworks/mudrod (MUDROD) > * https://bitbucket.org/coaps_mdc/doms/src (DOMS) > > = External Dependencies = > Each component of the Science Data Analytics Platform has its own > dependencies. Documentation will be available for integrating them. > > == MUDROD == > '''Core''' > com.google.code.gson gson 2.5 compile > jar false > org.jdom jdom 2.0.2 compile > jar false > org.elasticsearch elasticsearch 5.2.0 compile > jar false > org.elasticsearch elasticsearch-spark-20_2.11 5.2.0 compile > jar false > joda-time joda-time 2.9.4 compile > jar false > com.carrotsearch hppc 0.7.1 compile > jar false > org.apache.spark spark-core_2.11 2.1.0 compile > jar false > org.apache.spark spark-sql_2.11 2.1.0 compile > jar false > org.apache.spark spark-mllib_2.11 2.1.0 compile > jar false > org.scala-lang scala-library 2.11.8 compile > jar false > org.codehaus.jettison jettison 1.3.8 compile > jar false > commons-cli commons-cli 1.2 compile > jar false > net.sf.opencsv opencsv 2.3 compile > jar false > org.apache.jena jena-core 3.3.0 compile > jar false > junit junit 4.12 test > jar false > > '''Service''' > gov.nasa.jpl.mudrod mudrod-core 0.0.1-SNAPSHOT compile > jar false > javax.servlet javax.servlet-api 3.1.0 provided > jar false > com.google.code.gson gson 2.5 compile > jar false > > '''Web''' > * AngularJS - MIT License > * BootstrapJS - MIT License > * jQueryJS - MIT License > * Underscore JS - MIT License > > == DOMS == > * Apache Solr version 5.5.1http://lucene.apache.org/solr/ > * EDGE https://github.com/dataplumber/edge > * NetCDF4 http://unidata.github.io/netcdf4-python/ > * Python 3.5 (NOTE: only partial support for py2.7) > > Non stdlib Python dependencies: > * Jinja2==2.9.5 > * python-dateutil==2.6.0 > * cython==0.25.2 > * numpy==1.12.0 > * scipy==0.18.1 > * netCDF4==1.2.7 > * solrpy3 > * siphon==0.4.0 > * neo4j-driver==1.1.0 > * matplotlib==2.0.0 > * requests==2.13.0 > * shapely==1.5.17 > * flask==0.12 > * networkx==1.11 > * pyproj==1.9.5.1 > * blist==1.3.6 > > == NEXUS == > '''Analysis''' > * https://github.com/dataplumber/nexus/blob/master/ > analysis/package-list.txt > * https://github.com/dataplumber/nexus/blob/master/ > analysis/requirements.txt > > '''Client''' > * https://github.com/dataplumber/nexus/blob/master/ > client/requirements.txt > > '''Climatology''' > * matplotlib > * numpy > * netCDF4 > * pathos (https://pypi.python.org/pypi/pathos) > > '''Data-access''' > * https://github.com/dataplumber/nexus/blob/master/ > data-access/requirements.txt > > '''Nexus-ingest''' > ''Dataset-tiler'' > * https://github.com/dataplumber/nexus/tree/master/ > nexus-ingest/dataset-tiler/build/reports > > ''developer-box'' > * Just a collection of scripts/vagrant file used to stand up a developer > instance of nexus ingestion. No dependencies to report > > ''Groovy-scripts'' > * Collection of Groovy scripts that can be used as part of data > ingestion. They only rely on the standard Groovy library and the > ‘nexus-messages’ project > > ''Nexus-messages'' > * https://github.com/dataplumber/nexus/tree/master/ > nexus-ingest/nexus-messages/build/reports > > ''nexus-sink'' > * https://github.com/dataplumber/nexus/tree/master/ > nexus-ingest/nexus-sink/build/reports > > ''nexus-xd-python-modules'' > * https://github.com/dataplumber/nexus/blob/master/ > nexus-ingest/nexus-xd-python-modules/package-list.txt > * https://github.com/dataplumber/nexus/blob/master/ > nexus-ingest/nexus-xd-python-modules/requirements.txt > > ''spring-xd-python'' > * only python standard libraries are used > > ''tcp-shell'' > * https://github.com/dataplumber/nexus/tree/master/ > nexus-ingest/tcp-shell/build/reports > > '''tools/deletebyquery''' > * https://github.com/dataplumber/nexus/blob/master/tools/deletebyquery/ > requirements.txt > > = Required Resources = > Mailing Lists > * priv...@sdap.incubator.apache.org > * d...@sdap.incubator.apache.org > * comm...@sdap.incubator.apache.org > > Git Repos > * https://git-wip-us.apache.org/repos/asf/incubator-nexus.git > * https://git-wip-us.apache.org/repos/asf/incubator-doms.git > * https://git-wip-us.apache.org/repos/asf/incubator-mudrod.git > > Issue Tracking > * JIRA Science Data Analytics Platform (SDAP) > > Continuous Integration > * Jenkins builds on https://builds.apache.org/ > > Web > * http://sdap.incubator.apache.org/ > * wiki at http://cwiki.apache.org > > = Initial Committers = > The following is a list of the planned initial Apache committers (the > active subset of the committers for the current repository on Github). > * Lewis John McGibbney (lewi...@apache.org) > * Vardis M. Tsontos (vardis.m.tson...@jpl.nasa.gov) > * Joseph C. Jacob (joseph.c.ja...@jpl.nasa.gov) > * Ed Armstrong (edward.m.armstr...@jpl.nasa.gov) > * Frank Greguska (gregu...@jpl.nasa.gov) > * Brian Wilson (brian.wil...@jpl.nasa.gov) > * Chaowe Phil Yang (cya...@gmu.edu) > * Yongyao Jiang (yjia...@gmu.edu) > * Yun Li (yl...@gmu.edu) > * Shawn R. Smith (sm...@coaps.fsu.edu) > * Jocelyn Elya (je...@coaps.fsu.edu) > * Mark Bourassa (boura...@coaps.fsu.edu) > * Thomas Cram (tc...@ucar.edu) > * Thomas Huang (thomas.hu...@jpl.nasa.gov) > * Steven Worley (wor...@ucar.edu) > * Zaihua Ji (z...@ucar.edu) > > = Affiliations = > NASA JPL > * Lewis John McGibbney (lewi...@apache.org) > * Vardis M. Tsontos (vardis.m.tson...@jpl.nasa.gov) > * Joseph C. Jacob (joseph.c.ja...@jpl.nasa.gov) > * Ed Armstrong (edward.m.armstr...@jpl.nasa.gov) > * Frank Greguska (gregu...@jpl.nasa.gov) > * Thomas Huang (thomas.hu...@jpl.nasa.gov) > * Brian Wilson (brian.wil...@jpl.nasa.gov) > > George Mason University > * Chaowe Phil Yang (cya...@gmu.edu) > * Yongyao Jiang (yjia...@gmu.edu) > * Yun Li (yl...@gmu.edu) > > Center for Ocean-Atmospheric Prediction Studies, Florida State University > * Shawn R. Smith (sm...@coaps.fsu.edu) > * Jocelyn Elya (je...@coaps.fsu.edu) > * Mark Bourassa (boura...@coaps.fsu.edu) > > Computational Information Systems Laboratory (CISL) / National Center for > Atmospheric Research (NCAR) > * Thomas Cram (tc...@ucar.edu) > * Zaihua Ji (z...@ucar.edu) > * Steven Worley (wor...@ucar.edu) > > = Sponsors = > > = Champion = > * Lewis McGibbney (NASA/JPL) > > = Nominated Mentors = > * TBD > * TBD > * TBD > > = Sponsoring Entity = > The Apache Incubator > > > -- > http://home.apache.org/~lewismc/ > @hectorMcSpector > http://www.linkedin.com/in/lmcgibbney > -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org