Hi Steve,

It was not so much the lack of committers as it was the current diversity. That 
is not a blocker for entry to Incubation.

I am willing to be one of the Mentors. Once there are at least two more we can 
push forward.

Regards,
Dave

> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <stephen.d.lawre...@gmail.com> 
> wrote:
> 
> Discussions have died down, and I think the consensus from the responses
> is that the issues are 1) the lack of committers and 2) the lack of a
> champion and mentors. We hope to address #1 and grow the community as
> part of incubation. Is anyone interested in being a champion or mentor
> and help us with #2?
> 
> Thanks,
> - Steve
> 
> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
>> This sounds like a very interesting project.
>> 
>> I don’t have the time to mentor at the moment but I will keep a close eye on 
>> it.
>> 
>> Cheers,
>> Chris Mattmann
>> 
>> 
>> 
>> 
>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <mche...@illinois.edu> wrote:
>> 
>>    Hi Dave,
>> 
>>    The developers that were at NCSA have moved on to other organizations.  
>> While we still leverage Daffodil and are very much interested in seeing it 
>> move forward, development is currently done by the Tresys team.  Agreed on 
>> the synergy with Tika.
>> 
>>    Kenton McHenry, Ph.D.
>>    Principal Research Scientist, Adjunct Assistant Professor of Computer 
>> Science
>>    Deputy Director of the Scientific Software & Applications Division
>>    National Center for Supercomputing Applications, University of Illinois 
>> at Urbana-Champaign
>> 
>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher 
>> <dave2w...@comcast.net<mailto:dave2w...@comcast.net>> wrote:
>> 
>>    Hi Kenton,
>> 
>>    Is there any reason that you and others from the NCSA are not Initial 
>> Committers? That would make this proposal stronger.
>> 
>>    Regarding Apache Tika - it relies on other projects including Apache POI 
>> and Apache PDFBox. They are pragmatic about what is used. If Daffodil works 
>> to expand then I think that there would be good synergy between the 
>> projects. I know as a POI PMC member that the POI community has 
>> significantly benefited from the Tika community some of whom are from Mitre.
>> 
>>    To date Tika has not emphasized structured data, although they do extract 
>> content from Excel and OpenOffice.
>> 
>>    I am intrigued.
>> 
>>    Regards,
>>    Dave
>> 
>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron 
>> <mche...@illinois.edu<mailto:mche...@illinois.edu>> wrote:
>> 
>>    Yes, DFDL and its open source implementation Daffodil are more about file 
>> formats and getting access to the entirety of a file's contents in a 
>> consistent way through machine readable specifications.  The work has 
>> implications in the area of digital preservation allowing one to preserve 
>> these machine readable specifications rather than all the tools needed to 
>> open/save a file in order to work with it.  Imagine someone developing 
>> graphics software to work with 3D models and not having to worry about the 
>> hundreds of formats out there for 3D meshes (whether there are tools for 
>> opening the files and whether they can get access to those tools, whether 
>> the spec is available and worrying about how complex that spec is to 
>> implement, etc.), and simply building their code around the contents (e.g. 
>> vertices, faces, etc.).  One could come up with similar scenarios for other 
>> data types (documents, images, videos, audio, depth data, numeric data).  
>> Ideally tools built supporting DFDL, could someday, support any format for 
>> that type without the developer having to worry about the details of how 
>> that data is represented within a file.
>> 
>>    Kenton McHenry, Ph.D.
>>    Principal Research Scientist, Adjunct Assistant Professor of Computer 
>> Science
>>    Deputy Director of the Scientific Software & Applications Division
>>    National Center for Supercomputing Applications, University of Illinois 
>> at Urbana-Champaign
>> 
>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence 
>> <stephen.d.lawre...@gmail.com<mailto:stephen.d.lawre...@gmail.com><mailto:stephen.d.lawre...@gmail.com>>
>>  wrote:
>> 
>>    I'll preface this saying that I don't have a ton of experience with
>>    Apache Tika. But based on my understanding, Tika and Daffodil do have
>>    somewhat similar goals, but reach them in different ways. For example,
>>    Tika requires that one writes /code/ to perform data extraction, usually
>>    relying on existing Java libraries to extract the desired metadata. The
>>    downside to this is that code can be buggy, and libraries might not even
>>    exist for formats of interest (especially common with legacy and
>>    military data).
>> 
>>    Daffodil, on the other hand, does not require one to write any code.
>>    Instead, one writes a DFDL Schema (similar to XML Schema, with DFDL
>>    annotations) that fully describes the data, which Daffodil then uses to
>>    convert the data to XML/JSON for extraction. So adding support for a new
>>    format means writing a new schema rather than new code. And less code
>>    generally means less bugs. Also, for secure systems that require
>>    certification, generally speaking, it is easier to certify a schema as
>>    compared to code.
>> 
>>    We certainly don't believe that Daffodil could replace Tika, but it does
>>    have the potential to add new functionality to Tika for formats that do
>>    not have existing libraries. One of our goals is to look into
>>    integrating Daffodil support into tools like Tika. We'd love to hear
>>    from Tika devs if this is something they'd be interested in.
>> 
>>    I'll also add that whereas Tika tends to focus primarily on metadata,
>>    DFDL schemas usually describe an entire file format down to the byte, so
>>    one can extract more than just meta data, including text and binary
>>    data. Further differentiating, Daffodil has support for serializing data
>>    (called unparse) from the XML/JSON representation, allowing one to
>>    transform or filter data as well. We don't believe this feature is all
>>    that applicable to Tika, but may be useful to other technologies such as
>>    filtering or data fuzzing technologies.
>> 
>>    - Steve
>> 
>> 
>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
>>    What is the relationship between Daffodil and something like Apache Tika's
>>    extraction engine?
>> 
>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
>>    
>> stephen.d.lawre...@gmail.com<mailto:stephen.d.lawre...@gmail.com><mailto:stephen.d.lawre...@gmail.com>>
>>  wrote:
>> 
>>    Dear Apache Incubator Community,
>> 
>>    We would like to start a discussion around a proposal to bring Daffodil
>>    into the Apache Incubator. Daffodil is a implementation of the DFDL
>>    specification used to convert between fixed format data and XML/JSON.
>> 
>>    The draft proposal can be found in the wiki at the following URL:
>> 
>>    https://wiki.apache.org/incubator/DaffodilProposal
>> 
>>    We do not yet have a champion or mentors, but it was recommended that we
>>    create a proposal and send it to this list to potentially find those
>>    that might be interested. The text for the draft proposal is found
>>    below. We look forward to your input.
>> 
>>    Thanks,
>>    -Steve
>> 
>> 
>>    = Daffodil Proposal =
>> 
>>    == Abstract ==
>> 
>>    Daffodil is an implementation of the Data Format Description Language
>>    (DFDL) used to convert between fixed format data and XML/JSON.
>> 
>>    == Proposal ==
>> 
>>    The Data Format Description Language (DFDL) is a specification,
>>    developed by the Open Grid Forum, capable of describing many data
>>    formats, including both textual and binary, scientific and numeric,
>>    legacy and modern, commercial record-oriented, and many industry and
>>    military standards. It defines a language that is a subset of W3C XML
>>    schema to describe the logical format of the data, and annotations
>>    within the schema to describe the physical representation.
>> 
>>    Daffodil is an open source implementation of the DFDL specification that
>>    uses these DFDL schemas to parse fixed format data into an infoset,
>>    which is most commonly represented as either XML or JSON. This allows
>>    the use of well-established XML or JSON technologies and libraries to
>>    consume, inspect, and manipulate fixed format data in existing
>>    solutions. Daffodil is also capable of the reverse by serializing or
>>    "unparsing" an XML or JSON infoset back to the original data format.
>> 
>>    == Background ==
>> 
>>    Many different software solutions need to consume and manage data,
>>    including data directed routing, databases, data analysis, data
>>    cleansing, data visualizing, and more. A key aspect of such solutions is
>>    the need to transform the data into an easily consumable format.
>>    Usually, this means that for each unique data format, one develops a
>>    tool that can read and extract the necessary information, often leading
>>    to ad-hoc and data-format-specific description systems. Such systems are
>>    often proprietary, not well tested, and incompatible, leading to vendor
>>    lock-in, flawed software, and increased training costs. DFDL is a new
>>    standard, with version 1.0 completed in October of 2016, that solves
>>    these problems by defining an open standard to describe many different
>>    data formats and how to parse and unparse between the data and XML/JSON.
>> 
>>    Two closed source implementations of DFDL currently exist. The first was
>>    created by IBM and is now part of their IBM® Integration Bus product.
>>    The second was created by the European Space Agency, called DFDL4S or
>>    "DFDL for Space" targeted at the challenges of their satellite data
>>    processing.
>> 
>>    Around 2005, Pacific Northwest National Lab created Defuddle, built as
>>    an open source implementation and proof of concept of the draft DFDL
>>    specification and a test bed to feed new concepts into specification
>>    development. Primary development of Defuddle was eventually taken over
>>    by the National Center for Supercomputing Applications (NCSA). However,
>>    due to evolution of the DFDL specification and architectural and
>>    performance issues with Defuddle, around 2009, NCSA restarted the
>>    project with the new name of Daffodil, with a goal of implementing the
>>    complete DFDL specification. Daffodil development continued at NCSA
>>    until around 2012, at which point development slowed due to budget
>>    limitations. Shortly thereafter, primary development was picked up by
>>    Tresys Technology where it continues today, with contributions from
>>    other entities such as the Navy Research Lab, the Air Force Research
>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil
>>    version 1.0.0 was released, including support for the DFDL features
>>    needed to parse many common file formats. Daffodil version 2.0.0 is
>>    expected to be released in August of 2017, which will include unparse
>>    support with one-to-one parsing feature parity.
>> 
>>    Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman, Quark
>>    Security, Raytheon, and Tresys Technology have developed DFDL schemas
>>    for many data formats from varying technology domains, including PNG,
>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and MIL-STD-2045,
>>    many of which are publicly available on the DFDL Schemas github. There
>>    are also a number of military-application data formats, the
>>    specifications of which are not public, which have historically been
>>    very difficult and expensive to process, and for which DFDL schemas have
>>    been created or are actively in development; these include
>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG 5516
>>    (aka "Link16").
>> 
>>    == Rationale ==
>> 
>>    Numerous software solutions exist that consume, inspect, analyze, and
>>    transform data, many of which can be found in the Apache Software
>>    Foundation (ASF). In order for tools like these to consume new types of
>>    data, custom extensions are usually required, often with high
>>    development and testing costs. Daffodil fills a clear gap in many of
>>    these solutions, providing a simple and low cost way to transform data
>>    to XML or JSON, which many of these tools natively support already. With
>>    the upcoming 2.0.0 release, the Daffodil project will have achieved a
>>    level of functionality in both parse and unparse that, when integrated
>>    into existing solutions, could provide for a new method to quickly
>>    enable support for new data formats.
>> 
>>    == Initial Goals ==
>> 
>>    * Relicense the existing code from the University of Illinois/NCSA Open
>>    Source License to the Apache License version 2.0, working with Apache
>>    Legal to ensure correctness, and with Daffodil contributors to get
>>    their permission.
>>    * Move the existing codebase, documentation, bugs, and mailing lists to
>>    the Apache hosted infrastructure
>>    * Establish a formal release process and schedule, allowing for
>>    dependable release cycles in a manner consistent with the Apache
>>    development process.
>>    * Build relationships with ASF projects to add Daffodil support where
>>    appropriate
>>    * Grow the community to establish a diversity of background and expertise.
>> 
>>    == Current Status ==
>> 
>>    === Meritocracy ===
>> 
>>    All initial committers are familiar with the principles of meritocracy.
>>    The Daffodil project has followed the model of meritocracy in the past,
>>    providing multiple outside entities commit access based on the quality
>>    of their contributions. In order to grow the Daffodil user base and
>>    development community, we are dedicated to continuing to operate
>>    Daffodil as a meritocracy.
>> 
>>    A key ingredient in a meritocracy of developers is open group code
>>    review. The Daffodil project has operated in this mode throughout its
>>    existence and this provides a forum to improve the code, verify code
>>    quality, and educate new developers on the code base.
>> 
>>    === Community ===
>> 
>>    Daffodil has a small community of users and developers. Although primary
>>    Daffodil development is done by Tresys Technology, a handful of other
>>    contributions have come from other entities including the Navy Research
>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton. In
>>    addition to developers, multiple users of Daffodil have created DFDL
>>    schemas, including entities such as MITRE, IBM, Raytheon, Quark
>>    Security, and Tresys Technology. The DFDL Schemas github community has
>>    been created as a place for DFDL schemas to be published. The Daffodil
>>    project also makes use of mailing lists, !HipChat, and Confluence
>>    Questions to build a community of users and system for support.
>> 
>>    === Core Developers ===
>> 
>>    The core developers of Daffodil are employed by Tresys Technology. We
>>    will work to grow the community among a more diverse set of developers
>>    and industries.
>> 
>>    === Alignment ===
>> 
>>    Daffodil was created as an open source project with a philosophy
>>    consistent with The Apache Way. A strong belief in meritocracy,
>>    community involvement in decisions, openness, and ensuring a high level
>>    of quality in code, documentation, and testing are some of our shared
>>    core beliefs.
>> 
>>    Further, as mentioned in the Rationale section, Daffodil fills a gap
>>    that exists in many ASF projects, including !NiFi, Spark, Storm, Hadoop,
>>    Tika, and others. In order for tools like these to consume new types of
>>    data, custom extensions are usually required. Rather than create such
>>    extensions, Daffodil provides an easy and standards-compliant way to
>>    transform data to XML or JSON, which many of these tools already
>>    natively support.
>> 
>>    == Known Risks ==
>> 
>>    === Orphaned Products ===
>> 
>>    The current core developers are the leading contributors in the space of
>>    DFDL and wish to see it flourish. Though there is some risk that the
>>    initial committers all come from the same company, a goal of entering
>>    into incubation is to grow the development community to minimize the
>>    risk of reliance on a single company.
>> 
>>    === Inexperience with Open Source ===
>> 
>>    The Daffodil project began as an open source project and has continued
>>    that model throughout development. This includes public bug tracking,
>>    git revision control, automated builds and tests, and a public wiki for
>>    documentation.
>> 
>>    Additionally, the current core developers and initial committers all
>>    work for a company that relies on, believes in, promotes, and has led or
>>    contributed to many open source software projects, including SELinux
>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As such,
>>    there is low risk related to inexperience with open source software and
>>    processes.
>> 
>>    === Homogeneous Developers ===
>> 
>>    The proposed initial committers come from a single entity, though we are
>>    committed to growing the Daffodil development community to include a
>>    broad group of additional committers from a wide array of industries.
>> 
>>    === Reliance on Salaried Developers ===
>> 
>>    The proposed initial committers are paid by their employer to contribute
>>    to the Daffodil project. We expect that Daffodil development will
>>    continue with salaried developers, and are committed to growing the
>>    community to include non-salaried developers as well.
>> 
>>    === Relationship with other Apache Projects ===
>> 
>>    As mentioned in the Alignment section, Daffodil fills a clear gap in
>>    numerous other ASF projects that consume and manage large amounts of data.
>> 
>>    As a specific example, Daffodil developers have created a Daffodil
>>    Apache !NiFi Processor, currently in use in data transfer solutions,
>>    which allows one to ingest non-native data into an Apache !NiFi pipeline
>>    as XML or JSON. This processor was well received by the Apache !NiFi
>>    developers, with positive comments about the concise API and how it
>>    could handle non-native data. Daffodil developers have also successfully
>>    prototyped integration with Apache Spark. We believe Daffodil could
>>    provide a strong benefit to many other ASF projects that handle fixed
>>    format data. We anticipate working closely with such ASF projects to
>>    include Daffodil where applicable to increase their ability to support
>>    new data formats with minimal effort.
>> 
>>    Daffodil also depends on existing ASF projects, including Apache Commons
>>    and Apache Xerces.
>> 
>>    === An Excessive Fascination with the Apache Brand ===
>> 
>>    Although the Apache brand may certainly help to attract more
>>    contributors, publicity is not the reason for this proposal. We believe
>>    Daffodil could provide a great benefit to the ASF and the numerous data
>>    focused projects that comprise it, as described in the Rationale and
>>    Alignment sections. We hope to build a strong and vibrant community
>>    built around The Apache Way, and not dependent on a single company.
>> 
>>    === Documentation ===
>> 
>>    Daffodil documentation can be found at:
>> 
>>    *
>>    https://opensource.ncsa.illinois.edu/confluence/
>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
>> 
>>    Information about DFDL can be found at:
>> 
>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>>    *
>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
>>    0/com.ibm.etools.mft.doc/df20060_.htm
>> 
>>    Public examples of DFDL Schemas can be found at:
>> 
>>    * https://github.com/DFDLSchemas
>> 
>>    == Initial Source ==
>> 
>>    The Daffodil git repo goes back to mid-2011 with approximately 20
>>    different contributors and feedback from many users and developers. The
>>    core codebase is written in Scala and includes both a Scala and Java
>>    API, along with Javadocs and Scaladocs for API usage. The initial code
>>    will come from the git repository currently hosted by NCSA at the
>>    University of Illinois :
>> 
>>    https://opensource.ncsa.illinois.edu/bitbucket/
>>    projects/DFDL/repos/daffodil/
>> 
>>    == Source and Intellectual Property Submission ==
>> 
>>    The complete Daffodil code is licensed under the University of
>>    Illinois/NCSA Open Source License. Much of the current codebase has been
>>    developed by Tresys Technology, who is open to relicensing the code to
>>    the Apache License version 2.0 and donate the source to the ASF.
>>    Contacts at NCSA are also open to relicensing their contributions to
>>    Apache v2. We plan to contact the other contributors and ask for
>>    permission to relicense and donate their contributed code. For those
>>    that decline or we cannot contact, their code will be removed or
>>    replaced. We will work closely with Apache Legal to ensure all issues
>>    related to relicensing are acceptable.
>> 
>>    == External Dependencies ==
>> 
>>    We believe all current dependencies are compatible with the ASF
>>    guidelines. Our dependency licenses come from the following license
>>    styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil
>>    dependencies and their licenses are documented here:
>> 
>>    https://opensource.ncsa.illinois.edu/confluence/
>>    display/DFDL/Dependencies+and+Licenses
>> 
>>    == Cryptography ==
>> 
>>    None
>> 
>>    == Required Resources ==
>> 
>>    === Mailing Lists ===
>> 
>>    * comm...@daffodil.incubator.apache.org
>>    * d...@daffodil.incubator.apache.org
>>    * priv...@daffodil.incubator.apache.org
>>    * u...@daffodil.incubator.apache.org
>> 
>>    === Source Control ===
>> 
>>    git://git.apache.org/incubator-daffodil.git
>> 
>>    === Issue Tracking ===
>> 
>>    JIRA Daffodil (DFDL)
>> 
>>    === Initial Committers ===
>> 
>>    * Beth Finnegan <efinnegan at tresys dot com>
>>    * Dave Thompson <dthompson at tresys dot com>
>>    * Josh Adams <jadams at tresys dot com>
>>    * Mike Beckerle <mbeckerle at tresys dot com>
>>    * Steve Lawrence <slawrence at tresys dot com>
>>    * Taylor Wise <twise at tresys dot com>
>> 
>>    === Affiliations ===
>> 
>>    * Beth Finnegan (Tresys Technology)
>>    * Dave Thompson (Tresys Technology)
>>    * Josh Adams (Tresys Technology)
>>    * Mike Beckerle (Tresys Technology)
>>    * Steve Lawrence (Tresys Technology)
>>    * Taylor Wise (Tresys Technology)
>> 
>>    == Sponsors ==
>> 
>>    === Champion ===
>> 
>>    * TBD
>> 
>>    === Nominated Mentors ===
>> 
>>    * TBD
>> 
>>    === Sponsoring Entity ===
>> 
>>    We request the Apache Incubator to sponsor this project.
>> 
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>>    For additional commands, e-mail: general-h...@incubator.apache.org
>> 
>> 
>> 
>> 
>> 
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: 
>> general-unsubscr...@incubator.apache.org<mailto:general-unsubscr...@incubator.apache.org>
>>    For additional commands, e-mail: 
>> general-h...@incubator.apache.org<mailto:general-h...@incubator.apache.org>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to