Thanks John! On 08/02/2017 03:23 PM, John D. Ament wrote: > You can also count me in as a mentor. > > John > > On Wed, Aug 2, 2017 at 3:14 PM Steve Lawrence <stephen.d.lawre...@gmail.com> > wrote: > >> Understood. Thanks for the interest! >> >> - Steve >> >> On 08/02/2017 02:57 PM, Dave Fisher wrote: >>> Hi Steve, >>> >>> It was not so much the lack of committers as it was the current >> diversity. That is not a blocker for entry to Incubation. >>> >>> I am willing to be one of the Mentors. Once there are at least two more >> we can push forward. >>> >>> Regards, >>> Dave >>> >>>> On Aug 1, 2017, at 5:09 AM, Steve Lawrence < >> stephen.d.lawre...@gmail.com> wrote: >>>> >>>> Discussions have died down, and I think the consensus from the responses >>>> is that the issues are 1) the lack of committers and 2) the lack of a >>>> champion and mentors. We hope to address #1 and grow the community as >>>> part of incubation. Is anyone interested in being a champion or mentor >>>> and help us with #2? >>>> >>>> Thanks, >>>> - Steve >>>> >>>> On 07/26/2017 04:06 PM, Chris Mattmann wrote: >>>>> This sounds like a very interesting project. >>>>> >>>>> I don’t have the time to mentor at the moment but I will keep a close >> eye on it. >>>>> >>>>> Cheers, >>>>> Chris Mattmann >>>>> >>>>> >>>>> >>>>> >>>>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <mche...@illinois.edu> >> wrote: >>>>> >>>>> Hi Dave, >>>>> >>>>> The developers that were at NCSA have moved on to other >> organizations. While we still leverage Daffodil and are very much >> interested in seeing it move forward, development is currently done by the >> Tresys team. Agreed on the synergy with Tika. >>>>> >>>>> Kenton McHenry, Ph.D. >>>>> Principal Research Scientist, Adjunct Assistant Professor of >> Computer Science >>>>> Deputy Director of the Scientific Software & Applications Division >>>>> National Center for Supercomputing Applications, University of >> Illinois at Urbana-Champaign >>>>> >>>>> On Jul 24, 2017, at 1:55 PM, Dave Fisher <dave2w...@comcast.net >> <mailto:dave2w...@comcast.net>> wrote: >>>>> >>>>> Hi Kenton, >>>>> >>>>> Is there any reason that you and others from the NCSA are not >> Initial Committers? That would make this proposal stronger. >>>>> >>>>> Regarding Apache Tika - it relies on other projects including >> Apache POI and Apache PDFBox. They are pragmatic about what is used. If >> Daffodil works to expand then I think that there would be good synergy >> between the projects. I know as a POI PMC member that the POI community has >> significantly benefited from the Tika community some of whom are from Mitre. >>>>> >>>>> To date Tika has not emphasized structured data, although they do >> extract content from Excel and OpenOffice. >>>>> >>>>> I am intrigued. >>>>> >>>>> Regards, >>>>> Dave >>>>> >>>>> On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron < >> mche...@illinois.edu<mailto:mche...@illinois.edu>> wrote: >>>>> >>>>> Yes, DFDL and its open source implementation Daffodil are more >> about file formats and getting access to the entirety of a file's contents >> in a consistent way through machine readable specifications. The work has >> implications in the area of digital preservation allowing one to preserve >> these machine readable specifications rather than all the tools needed to >> open/save a file in order to work with it. Imagine someone developing >> graphics software to work with 3D models and not having to worry about the >> hundreds of formats out there for 3D meshes (whether there are tools for >> opening the files and whether they can get access to those tools, whether >> the spec is available and worrying about how complex that spec is to >> implement, etc.), and simply building their code around the contents (e.g. >> vertices, faces, etc.). One could come up with similar scenarios for other >> data types (documents, images, videos, audio, depth data, numeric data). >> Ideally tools built supporting DFDL, could someday, support any format for >> that type without the developer having to worry about the details of how >> that data is represented within a file. >>>>> >>>>> Kenton McHenry, Ph.D. >>>>> Principal Research Scientist, Adjunct Assistant Professor of >> Computer Science >>>>> Deputy Director of the Scientific Software & Applications Division >>>>> National Center for Supercomputing Applications, University of >> Illinois at Urbana-Champaign >>>>> >>>>> On Jul 24, 2017, at 10:30 AM, Steve Lawrence < >> stephen.d.lawre...@gmail.com<mailto:stephen.d.lawre...@gmail.com><mailto: >> stephen.d.lawre...@gmail.com>> wrote: >>>>> >>>>> I'll preface this saying that I don't have a ton of experience with >>>>> Apache Tika. But based on my understanding, Tika and Daffodil do >> have >>>>> somewhat similar goals, but reach them in different ways. For >> example, >>>>> Tika requires that one writes /code/ to perform data extraction, >> usually >>>>> relying on existing Java libraries to extract the desired metadata. >> The >>>>> downside to this is that code can be buggy, and libraries might not >> even >>>>> exist for formats of interest (especially common with legacy and >>>>> military data). >>>>> >>>>> Daffodil, on the other hand, does not require one to write any code. >>>>> Instead, one writes a DFDL Schema (similar to XML Schema, with DFDL >>>>> annotations) that fully describes the data, which Daffodil then >> uses to >>>>> convert the data to XML/JSON for extraction. So adding support for >> a new >>>>> format means writing a new schema rather than new code. And less >> code >>>>> generally means less bugs. Also, for secure systems that require >>>>> certification, generally speaking, it is easier to certify a schema >> as >>>>> compared to code. >>>>> >>>>> We certainly don't believe that Daffodil could replace Tika, but it >> does >>>>> have the potential to add new functionality to Tika for formats >> that do >>>>> not have existing libraries. One of our goals is to look into >>>>> integrating Daffodil support into tools like Tika. We'd love to hear >>>>> from Tika devs if this is something they'd be interested in. >>>>> >>>>> I'll also add that whereas Tika tends to focus primarily on >> metadata, >>>>> DFDL schemas usually describe an entire file format down to the >> byte, so >>>>> one can extract more than just meta data, including text and binary >>>>> data. Further differentiating, Daffodil has support for serializing >> data >>>>> (called unparse) from the XML/JSON representation, allowing one to >>>>> transform or filter data as well. We don't believe this feature is >> all >>>>> that applicable to Tika, but may be useful to other technologies >> such as >>>>> filtering or data fuzzing technologies. >>>>> >>>>> - Steve >>>>> >>>>> >>>>> On 07/24/2017 10:59 AM, Mike Drob wrote: >>>>> What is the relationship between Daffodil and something like Apache >> Tika's >>>>> extraction engine? >>>>> >>>>> On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence < >>>>> stephen.d.lawre...@gmail.com<mailto:stephen.d.lawre...@gmail.com >>> <mailto:stephen.d.lawre...@gmail.com>> wrote: >>>>> >>>>> Dear Apache Incubator Community, >>>>> >>>>> We would like to start a discussion around a proposal to bring >> Daffodil >>>>> into the Apache Incubator. Daffodil is a implementation of the DFDL >>>>> specification used to convert between fixed format data and >> XML/JSON. >>>>> >>>>> The draft proposal can be found in the wiki at the following URL: >>>>> >>>>> https://wiki.apache.org/incubator/DaffodilProposal >>>>> >>>>> We do not yet have a champion or mentors, but it was recommended >> that we >>>>> create a proposal and send it to this list to potentially find those >>>>> that might be interested. The text for the draft proposal is found >>>>> below. We look forward to your input. >>>>> >>>>> Thanks, >>>>> -Steve >>>>> >>>>> >>>>> = Daffodil Proposal = >>>>> >>>>> == Abstract == >>>>> >>>>> Daffodil is an implementation of the Data Format Description >> Language >>>>> (DFDL) used to convert between fixed format data and XML/JSON. >>>>> >>>>> == Proposal == >>>>> >>>>> The Data Format Description Language (DFDL) is a specification, >>>>> developed by the Open Grid Forum, capable of describing many data >>>>> formats, including both textual and binary, scientific and numeric, >>>>> legacy and modern, commercial record-oriented, and many industry and >>>>> military standards. It defines a language that is a subset of W3C >> XML >>>>> schema to describe the logical format of the data, and annotations >>>>> within the schema to describe the physical representation. >>>>> >>>>> Daffodil is an open source implementation of the DFDL specification >> that >>>>> uses these DFDL schemas to parse fixed format data into an infoset, >>>>> which is most commonly represented as either XML or JSON. This >> allows >>>>> the use of well-established XML or JSON technologies and libraries >> to >>>>> consume, inspect, and manipulate fixed format data in existing >>>>> solutions. Daffodil is also capable of the reverse by serializing or >>>>> "unparsing" an XML or JSON infoset back to the original data format. >>>>> >>>>> == Background == >>>>> >>>>> Many different software solutions need to consume and manage data, >>>>> including data directed routing, databases, data analysis, data >>>>> cleansing, data visualizing, and more. A key aspect of such >> solutions is >>>>> the need to transform the data into an easily consumable format. >>>>> Usually, this means that for each unique data format, one develops a >>>>> tool that can read and extract the necessary information, often >> leading >>>>> to ad-hoc and data-format-specific description systems. Such >> systems are >>>>> often proprietary, not well tested, and incompatible, leading to >> vendor >>>>> lock-in, flawed software, and increased training costs. DFDL is a >> new >>>>> standard, with version 1.0 completed in October of 2016, that solves >>>>> these problems by defining an open standard to describe many >> different >>>>> data formats and how to parse and unparse between the data and >> XML/JSON. >>>>> >>>>> Two closed source implementations of DFDL currently exist. The >> first was >>>>> created by IBM and is now part of their IBM® Integration Bus >> product. >>>>> The second was created by the European Space Agency, called DFDL4S >> or >>>>> "DFDL for Space" targeted at the challenges of their satellite data >>>>> processing. >>>>> >>>>> Around 2005, Pacific Northwest National Lab created Defuddle, built >> as >>>>> an open source implementation and proof of concept of the draft DFDL >>>>> specification and a test bed to feed new concepts into specification >>>>> development. Primary development of Defuddle was eventually taken >> over >>>>> by the National Center for Supercomputing Applications (NCSA). >> However, >>>>> due to evolution of the DFDL specification and architectural and >>>>> performance issues with Defuddle, around 2009, NCSA restarted the >>>>> project with the new name of Daffodil, with a goal of implementing >> the >>>>> complete DFDL specification. Daffodil development continued at NCSA >>>>> until around 2012, at which point development slowed due to budget >>>>> limitations. Shortly thereafter, primary development was picked up >> by >>>>> Tresys Technology where it continues today, with contributions from >>>>> other entities such as the Navy Research Lab, the Air Force Research >>>>> Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil >>>>> version 1.0.0 was released, including support for the DFDL features >>>>> needed to parse many common file formats. Daffodil version 2.0.0 is >>>>> expected to be released in August of 2017, which will include >> unparse >>>>> support with one-to-one parsing feature parity. >>>>> >>>>> Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman, >> Quark >>>>> Security, Raytheon, and Tresys Technology have developed DFDL >> schemas >>>>> for many data formats from varying technology domains, including >> PNG, >>>>> GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and >> MIL-STD-2045, >>>>> many of which are publicly available on the DFDL Schemas github. >> There >>>>> are also a number of military-application data formats, the >>>>> specifications of which are not public, which have historically been >>>>> very difficult and expensive to process, and for which DFDL schemas >> have >>>>> been created or are actively in development; these include >>>>> MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG >> 5516 >>>>> (aka "Link16"). >>>>> >>>>> == Rationale == >>>>> >>>>> Numerous software solutions exist that consume, inspect, analyze, >> and >>>>> transform data, many of which can be found in the Apache Software >>>>> Foundation (ASF). In order for tools like these to consume new >> types of >>>>> data, custom extensions are usually required, often with high >>>>> development and testing costs. Daffodil fills a clear gap in many of >>>>> these solutions, providing a simple and low cost way to transform >> data >>>>> to XML or JSON, which many of these tools natively support already. >> With >>>>> the upcoming 2.0.0 release, the Daffodil project will have achieved >> a >>>>> level of functionality in both parse and unparse that, when >> integrated >>>>> into existing solutions, could provide for a new method to quickly >>>>> enable support for new data formats. >>>>> >>>>> == Initial Goals == >>>>> >>>>> * Relicense the existing code from the University of Illinois/NCSA >> Open >>>>> Source License to the Apache License version 2.0, working with >> Apache >>>>> Legal to ensure correctness, and with Daffodil contributors to get >>>>> their permission. >>>>> * Move the existing codebase, documentation, bugs, and mailing >> lists to >>>>> the Apache hosted infrastructure >>>>> * Establish a formal release process and schedule, allowing for >>>>> dependable release cycles in a manner consistent with the Apache >>>>> development process. >>>>> * Build relationships with ASF projects to add Daffodil support >> where >>>>> appropriate >>>>> * Grow the community to establish a diversity of background and >> expertise. >>>>> >>>>> == Current Status == >>>>> >>>>> === Meritocracy === >>>>> >>>>> All initial committers are familiar with the principles of >> meritocracy. >>>>> The Daffodil project has followed the model of meritocracy in the >> past, >>>>> providing multiple outside entities commit access based on the >> quality >>>>> of their contributions. In order to grow the Daffodil user base and >>>>> development community, we are dedicated to continuing to operate >>>>> Daffodil as a meritocracy. >>>>> >>>>> A key ingredient in a meritocracy of developers is open group code >>>>> review. The Daffodil project has operated in this mode throughout >> its >>>>> existence and this provides a forum to improve the code, verify code >>>>> quality, and educate new developers on the code base. >>>>> >>>>> === Community === >>>>> >>>>> Daffodil has a small community of users and developers. Although >> primary >>>>> Daffodil development is done by Tresys Technology, a handful of >> other >>>>> contributions have come from other entities including the Navy >> Research >>>>> Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton. In >>>>> addition to developers, multiple users of Daffodil have created DFDL >>>>> schemas, including entities such as MITRE, IBM, Raytheon, Quark >>>>> Security, and Tresys Technology. The DFDL Schemas github community >> has >>>>> been created as a place for DFDL schemas to be published. The >> Daffodil >>>>> project also makes use of mailing lists, !HipChat, and Confluence >>>>> Questions to build a community of users and system for support. >>>>> >>>>> === Core Developers === >>>>> >>>>> The core developers of Daffodil are employed by Tresys Technology. >> We >>>>> will work to grow the community among a more diverse set of >> developers >>>>> and industries. >>>>> >>>>> === Alignment === >>>>> >>>>> Daffodil was created as an open source project with a philosophy >>>>> consistent with The Apache Way. A strong belief in meritocracy, >>>>> community involvement in decisions, openness, and ensuring a high >> level >>>>> of quality in code, documentation, and testing are some of our >> shared >>>>> core beliefs. >>>>> >>>>> Further, as mentioned in the Rationale section, Daffodil fills a gap >>>>> that exists in many ASF projects, including !NiFi, Spark, Storm, >> Hadoop, >>>>> Tika, and others. In order for tools like these to consume new >> types of >>>>> data, custom extensions are usually required. Rather than create >> such >>>>> extensions, Daffodil provides an easy and standards-compliant way to >>>>> transform data to XML or JSON, which many of these tools already >>>>> natively support. >>>>> >>>>> == Known Risks == >>>>> >>>>> === Orphaned Products === >>>>> >>>>> The current core developers are the leading contributors in the >> space of >>>>> DFDL and wish to see it flourish. Though there is some risk that the >>>>> initial committers all come from the same company, a goal of >> entering >>>>> into incubation is to grow the development community to minimize the >>>>> risk of reliance on a single company. >>>>> >>>>> === Inexperience with Open Source === >>>>> >>>>> The Daffodil project began as an open source project and has >> continued >>>>> that model throughout development. This includes public bug >> tracking, >>>>> git revision control, automated builds and tests, and a public wiki >> for >>>>> documentation. >>>>> >>>>> Additionally, the current core developers and initial committers all >>>>> work for a company that relies on, believes in, promotes, and has >> led or >>>>> contributed to many open source software projects, including SELinux >>>>> Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As >> such, >>>>> there is low risk related to inexperience with open source software >> and >>>>> processes. >>>>> >>>>> === Homogeneous Developers === >>>>> >>>>> The proposed initial committers come from a single entity, though >> we are >>>>> committed to growing the Daffodil development community to include a >>>>> broad group of additional committers from a wide array of >> industries. >>>>> >>>>> === Reliance on Salaried Developers === >>>>> >>>>> The proposed initial committers are paid by their employer to >> contribute >>>>> to the Daffodil project. We expect that Daffodil development will >>>>> continue with salaried developers, and are committed to growing the >>>>> community to include non-salaried developers as well. >>>>> >>>>> === Relationship with other Apache Projects === >>>>> >>>>> As mentioned in the Alignment section, Daffodil fills a clear gap in >>>>> numerous other ASF projects that consume and manage large amounts >> of data. >>>>> >>>>> As a specific example, Daffodil developers have created a Daffodil >>>>> Apache !NiFi Processor, currently in use in data transfer solutions, >>>>> which allows one to ingest non-native data into an Apache !NiFi >> pipeline >>>>> as XML or JSON. This processor was well received by the Apache !NiFi >>>>> developers, with positive comments about the concise API and how it >>>>> could handle non-native data. Daffodil developers have also >> successfully >>>>> prototyped integration with Apache Spark. We believe Daffodil could >>>>> provide a strong benefit to many other ASF projects that handle >> fixed >>>>> format data. We anticipate working closely with such ASF projects to >>>>> include Daffodil where applicable to increase their ability to >> support >>>>> new data formats with minimal effort. >>>>> >>>>> Daffodil also depends on existing ASF projects, including Apache >> Commons >>>>> and Apache Xerces. >>>>> >>>>> === An Excessive Fascination with the Apache Brand === >>>>> >>>>> Although the Apache brand may certainly help to attract more >>>>> contributors, publicity is not the reason for this proposal. We >> believe >>>>> Daffodil could provide a great benefit to the ASF and the numerous >> data >>>>> focused projects that comprise it, as described in the Rationale and >>>>> Alignment sections. We hope to build a strong and vibrant community >>>>> built around The Apache Way, and not dependent on a single company. >>>>> >>>>> === Documentation === >>>>> >>>>> Daffodil documentation can be found at: >>>>> >>>>> * >>>>> https://opensource.ncsa.illinois.edu/confluence/ >>>>> display/DFDL/Daffodil%3A+Open+Source+DFDL >>>>> >>>>> Information about DFDL can be found at: >>>>> >>>>> * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl >>>>> * >>>>> https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0. >>>>> 0/com.ibm.etools.mft.doc/df20060_.htm >>>>> >>>>> Public examples of DFDL Schemas can be found at: >>>>> >>>>> * https://github.com/DFDLSchemas >>>>> >>>>> == Initial Source == >>>>> >>>>> The Daffodil git repo goes back to mid-2011 with approximately 20 >>>>> different contributors and feedback from many users and developers. >> The >>>>> core codebase is written in Scala and includes both a Scala and Java >>>>> API, along with Javadocs and Scaladocs for API usage. The initial >> code >>>>> will come from the git repository currently hosted by NCSA at the >>>>> University of Illinois : >>>>> >>>>> https://opensource.ncsa.illinois.edu/bitbucket/ >>>>> projects/DFDL/repos/daffodil/ >>>>> >>>>> == Source and Intellectual Property Submission == >>>>> >>>>> The complete Daffodil code is licensed under the University of >>>>> Illinois/NCSA Open Source License. Much of the current codebase has >> been >>>>> developed by Tresys Technology, who is open to relicensing the code >> to >>>>> the Apache License version 2.0 and donate the source to the ASF. >>>>> Contacts at NCSA are also open to relicensing their contributions to >>>>> Apache v2. We plan to contact the other contributors and ask for >>>>> permission to relicense and donate their contributed code. For those >>>>> that decline or we cannot contact, their code will be removed or >>>>> replaced. We will work closely with Apache Legal to ensure all >> issues >>>>> related to relicensing are acceptable. >>>>> >>>>> == External Dependencies == >>>>> >>>>> We believe all current dependencies are compatible with the ASF >>>>> guidelines. Our dependency licenses come from the following license >>>>> styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil >>>>> dependencies and their licenses are documented here: >>>>> >>>>> https://opensource.ncsa.illinois.edu/confluence/ >>>>> display/DFDL/Dependencies+and+Licenses >>>>> >>>>> == Cryptography == >>>>> >>>>> None >>>>> >>>>> == Required Resources == >>>>> >>>>> === Mailing Lists === >>>>> >>>>> * comm...@daffodil.incubator.apache.org >>>>> * d...@daffodil.incubator.apache.org >>>>> * priv...@daffodil.incubator.apache.org >>>>> * u...@daffodil.incubator.apache.org >>>>> >>>>> === Source Control === >>>>> >>>>> git://git.apache.org/incubator-daffodil.git >>>>> >>>>> === Issue Tracking === >>>>> >>>>> JIRA Daffodil (DFDL) >>>>> >>>>> === Initial Committers === >>>>> >>>>> * Beth Finnegan <efinnegan at tresys dot com> >>>>> * Dave Thompson <dthompson at tresys dot com> >>>>> * Josh Adams <jadams at tresys dot com> >>>>> * Mike Beckerle <mbeckerle at tresys dot com> >>>>> * Steve Lawrence <slawrence at tresys dot com> >>>>> * Taylor Wise <twise at tresys dot com> >>>>> >>>>> === Affiliations === >>>>> >>>>> * Beth Finnegan (Tresys Technology) >>>>> * Dave Thompson (Tresys Technology) >>>>> * Josh Adams (Tresys Technology) >>>>> * Mike Beckerle (Tresys Technology) >>>>> * Steve Lawrence (Tresys Technology) >>>>> * Taylor Wise (Tresys Technology) >>>>> >>>>> == Sponsors == >>>>> >>>>> === Champion === >>>>> >>>>> * TBD >>>>> >>>>> === Nominated Mentors === >>>>> >>>>> * TBD >>>>> >>>>> === Sponsoring Entity === >>>>> >>>>> We request the Apache Incubator to sponsor this project. >>>>> >>>>> >> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>>>> For additional commands, e-mail: general-h...@incubator.apache.org >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> <mailto:general-unsubscr...@incubator.apache.org> >>>>> For additional commands, e-mail: general-h...@incubator.apache.org >> <mailto:general-h...@incubator.apache.org> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>>>> For additional commands, e-mail: general-h...@incubator.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>>> For additional commands, e-mail: general-h...@incubator.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >
--------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org