+1 (binding). Thanks!
On Tue, Feb 2, 2016 at 2:06 PM Henri Yandell <bay...@apache.org> wrote: > I'm more likely to guide contributions from my employer. There's been some > contributions thus far, and there is interest to put more dayjob time into > contributing, but currently there's no coder who personally is committed to > the project. > > Hen > > On Mon, Feb 1, 2016 at 7:20 AM, Mattmann, Chris A (3980) < > chris.a.mattm...@jpl.nasa.gov> wrote: > > > Hey Jim, > > > > This is a valid concern, one that I hope is mediated by taking > > however long it takes in Incubation to attract some new committers > > to work on the project. Hopefully too you saw how long I took to > > allow the discussion to occur and so forth. > > > > Lewis has actively contributed to Joshua already - you can see - > > via the HomeBrew package he created, see: > > > > https://github.com/Homebrew/homebrew/pull/45746 > > > > > > You can see too it wasn’t something just recent or something > > super quick it’s something he had to work at. > > > > As for me, my involvement is going to be limited, but I am > > actively pursuing Tika’s integration with Joshua as part of > > TIKA-1343: http://issues.apache.org/jira/browse/TIKA-1343. > > > > Finally my suspicion is that Tom, Henry and Tommaso will > > contribute a lot as well. > > > > Thanks for listening. > > > > Cheers, > > Chris > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Chief Architect > > Instrument Software and Science Data Systems Section (398) > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 168-519, Mailstop: 168-527 > > Email: chris.a.mattm...@nasa.gov > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Adjunct Associate Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > -----Original Message----- > > From: Jim Jagielski <j...@jagunet.com> > > Reply-To: "general@incubator.apache.org" <general@incubator.apache.org> > > Date: Monday, February 1, 2016 at 4:20 AM > > To: "general@incubator.apache.org" <general@incubator.apache.org> > > Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> > > Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling > > > > >I know this is specifically called-out in the proposal, but it > > >does seem worthy of further discussion. > > > > > >This has a pretty small list of initial committers, esp when one > considers > > >how over-booked 2 of them appear to be. > > > > > >So, realistically, how active do both Chris and Lewis expect > > >to be? > > > > > >> On Jan 30, 2016, at 3:00 PM, Mattmann, Chris A (3980) > > >><chris.a.mattm...@jpl.nasa.gov> wrote: > > >> > > >> Hi Everyone, > > >> > > >> OK the discussion is now completed. Please VOTE to accept Joshua > > >> into the Apache Incubator. I’ll leave the VOTE open for at least > > >> the next 72 hours, with hopes to close it next Friday the 5th of > > >> February, 2016. > > >> > > >> [ ] +1 Accept Joshua as an Apache Incubator podling. > > >> [ ] +0 Abstain. > > >> [ ] -1 Don’t accept Joshua as an Apache Incubator podling because.. > > >> > > >> Of course, I am +1 on this. Please note VOTEs from Incubator PMC > > >> members are binding but all are welcome to VOTE! > > >> > > >> Cheers, > > >> Chris > > >> > > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >> Chris Mattmann, Ph.D. > > >> Chief Architect > > >> Instrument Software and Science Data Systems Section (398) > > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > >> Office: 168-519, Mailstop: 168-527 > > >> Email: chris.a.mattm...@nasa.gov > > >> WWW: http://sunset.usc.edu/~mattmann/ > > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >> Adjunct Associate Professor, Computer Science Department > > >> University of Southern California, Los Angeles, CA 90089 USA > > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >> > > >> > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: jpluser <chris.a.mattm...@jpl.nasa.gov> > > >> Date: Tuesday, January 12, 2016 at 10:56 PM > > >> To: "general@incubator.apache.org" <general@incubator.apache.org> > > >> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> > > >> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine > > >>Translation > > >> Toolkit > > >> > > >>> Hi Everyone, > > >>> > > >>> Please find attached for your viewing pleasure a proposed new > project, > > >>> Apache Joshua, a statistical machine translation toolkit. The > proposal > > >>> is in wiki draft form at: > > >>>https://wiki.apache.org/incubator/JoshuaProposal > > >>> > > >>> Proposal text is copied below. I’ll leave the discussion open for a > > >>>week > > >>> and we are interested in folks who would like to be initial > committers > > >>> and mentors. Please discuss here on the thread. > > >>> > > >>> Thanks! > > >>> > > >>> Cheers, > > >>> Chris (Champion) > > >>> > > >>> ——— > > >>> > > >>> = Joshua Proposal = > > >>> > > >>> == Abstract == > > >>> [[joshua-decoder.org|Joshua]] is an open-source statistical machine > > >>> translation toolkit. It includes a Java-based decoder for translating > > >>>with > > >>> phrase-based, hierarchical, and syntax-based translation models, a > > >>> Hadoop-based grammar extractor (Thrax), and an extensive set of tools > > >>>and > > >>> scripts for training and evaluating new models from parallel text. > > >>> > > >>> == Proposal == > > >>> Joshua is a state of the art statistical machine translation system > > >>>that > > >>> provides a number of features: > > >>> > > >>> * Support for the two main paradigms in statistical machine > > >>>translation: > > >>> phrase-based and hierarchical / syntactic. > > >>> * A sparse feature API that makes it easy to add new feature > templates > > >>> supporting millions of features > > >>> * Native implementations of many tuners (MERT, MIRA, PRO, and > AdaGrad) > > >>> * Support for lattice decoding, allowing upstream NLP tools to expose > > >>> their hypothesis space to the MT system > > >>> * An efficient representation for models, allowing for quick loading > of > > >>> multi-gigabyte model files > > >>> * Fast decoding speed (on par with Moses and mtplz) > > >>> * Language packs — precompiled models that allow the decoder to be > run > > >>>as > > >>> a black box > > >>> * Thrax, a Hadoop-based tool for learning translation models from > > >>> parallel text > > >>> * A suite of tools for constructing new models for any language pair > > >>>for > > >>> which sufficient training data exists > > >>> > > >>> == Background and Rationale == > > >>> A number of factors make this a good time for an Apache project > > >>>focused on > > >>> machine translation (MT): the quality of MT output (for many language > > >>> pairs); the average computing resources available on computers, > > >>>relative > > >>> to the needs of MT systems; and the availability of a number of > > >>> high-quality toolkits, together with a large base of researchers > > >>>working > > >>> on them. > > >>> > > >>> Over the past decade, machine translation (MT; the automatic > > >>>translation > > >>> of one human language to another) has become a reality. The research > > >>>into > > >>> statistical approaches to translation that began in the early > nineties, > > >>> together with the availability of large amounts of training data, and > > >>> better computing infrastructure, have all come together to produce > > >>> translations results that are “good enough” for a large set of > language > > >>> pairs and use cases. Free services like > > >>> [[https://www.bing.com/translator|Bing Translator]] and > > >>> [[https://translate.google.com|Google Translate]] have made these > > >>>services > > >>> available to the average person through direct interfaces and through > > >>> tools like browser plugins, and sites across the world with higher > > >>> translation needs use them to translate their pages through > > >>>automatically. > > >>> > > >>> MT does not require the infrastructure of large corporations in order > > >>>to > > >>> produce feasible output. Machine translation can be > resource-intensive, > > >>> but need not be prohibitively so. Disk and memory usage are mostly a > > >>> matter of model size, which for most language pairs is a few > gigabytes > > >>>at > > >>> most, at which size models can provide coverage on the order of tens > or > > >>> even hundreds of thousands of words in the input and output > languages. > > >>>The > > >>> computational complexity of the algorithms used to search for > > >>>translations > > >>> of new sentences are typically linear in the number of words in the > > >>>input > > >>> sentence, making it possible to run a translation engine on a > personal > > >>> computer. > > >>> > > >>> The research community has produced many different open source > > >>>translation > > >>> projects for a range of programming languages and under a variety of > > >>> licenses. These projects include the core “decoder”, which takes a > > >>>model > > >>> and uses it to translate new sentences between the language pair the > > >>>model > > >>> was defined for. They also typically include a large set of tools > that > > >>> enable new models to be built from large sets of example translations > > >>> (“parallel data”) and monolingual texts. These toolkits are usually > > >>>built > > >>> to support the agendas of the (largely) academic researchers that > build > > >>> them: the repeated cycle of building new models, tuning model > > >>>parameters > > >>> against development data, and evaluating them against held-out test > > >>>data, > > >>> using standard metrics for testing the quality of MT output. > > >>> > > >>> Together, these three factors—the quality of machine translation > > >>>output, > > >>> the feasibility of translating on standard computers, and the > > >>>availability > > >>> of tools to build models—make it reasonable for the end users to use > > >>>MT as > > >>> a black-box service, and to run it on their personal machine. > > >>> > > >>> These factors make it a good time for an organization with the status > > >>>of > > >>> the Apache Foundation to host a machine translation project. > > >>> > > >>> == Current Status == > > >>> Joshua was originally ported from David Chiang’s Python > implementation > > >>>of > > >>> Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins > > >>> University. The current version is maintained by Matt Post at Johns > > >>> Hopkins’ Human Language Technology Center of Excellence. Joshua has > > >>>made > > >>> many releases with a list of over 20 source code tags. The last > > >>>release of > > >>> Joshua was 6.0.5 on November 5th, 2015. > > >>> > > >>> == Meritocracy == > > >>> The current developers are familiar with meritocratic open source > > >>> development at Apache. Apache was chosen specifically because we want > > >>>to > > >>> encourage this style of development for the project. > > >>> > > >>> == Community == > > >>> Joshua is used widely across the world. Perhaps its biggest (known) > > >>> research / industrial user is the Amazon research group in Berlin. > > >>>Another > > >>> user is the US Army Research Lab. No formal census has been > undertaken, > > >>> but posts to the Joshua technical support mailing list, along with > the > > >>> occasional contributions, suggest small research and academic > > >>>communities > > >>> spread across the world, many of them in India. > > >>> > > >>> During incubation, we will explicitly seek to increase our usage > across > > >>> the board, including academic research, industry, and other end users > > >>> interested in statistical machine translation. > > >>> > > >>> == Core Developers == > > >>> The current set of core developers is fairly small, having fallen > with > > >>>the > > >>> graduation from Johns Hopkins of some core student participants. > > >>>However, > > >>> Joshua is used fairly widely, as mentioned above, and there remains a > > >>> commitment from the principal researcher at Johns Hopkins to continue > > >>>to > > >>> use and develop it. Joshua has seen a number of new community members > > >>> become interested recently due to a potential for its projected use > in > > >>>a > > >>> number of ongoing DARPA projects such as XDATA and Memex. > > >>> > > >>> == Alignment == > > >>> Joshua is currently Copyright (c) 2015, Johns Hopkins University All > > >>> rights reserved and licensed under BSD 2-clause license. It would of > > >>> course be the intention to relicense this code under AL2.0 which > would > > >>> permit expanded and increased use of the software within Apache > > >>>projects. > > >>> There is currently an ongoing effort within the Apache Tika community > > >>>to > > >>> utilize Joshua within Tika’s Translate API, see > > >>> [[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]]. > > >>> > > >>> == Known Risks == > > >>> > > >>> === Orphaned products === > > >>> At the moment, regular contributions are made by a single > contributor, > > >>>the > > >>> lead maintainer. He (Matt Post) plans to continue development for the > > >>>next > > >>> few years, but it is still a single point of failure, since the > > >>>graduate > > >>> students who worked on the project have moved on to jobs, mostly in > > >>> industry. However, our goal is to help that process by growing the > > >>> community in Apache, and at least in growing the community with users > > >>>and > > >>> participants from NASA JPL. > > >>> > > >>> === Inexperience with Open Source === > > >>> The team both at Johns Hopkins and NASA JPL have experience with many > > >>>OSS > > >>> software projects at Apache and elsewhere. We understand "how it > works" > > >>> here at the foundation. > > >>> > > >>> > > >>> == Relationships with Other Apache Products == > > >>> Joshua includes dependences on Hadoop, and also is included as a > > >>>plugin in > > >>> Apache Tika. We are also interested in coordinating with other > projects > > >>> including Spark, and other projects needing MT services for language > > >>> translation. > > >>> > > >>> == Developers == > > >>> Joshua only has one regular developer who is employed by Johns > Hopkins > > >>> University. NASA JPL (Mattmann and McGibbney) have been contributing > > >>> lately including a Brew formula and other contributions to the > project > > >>> through the DARPA XDATA and Memex programs. > > >>> > > >>> == Documentation == > > >>> Documentation and publications related to Joshua can be found at > > >>> joshua-decoder.org. The source for the Joshua documentation is > > >>>currently > > >>> hosted on Github at > > >>> https://github.com/joshua-decoder/joshua-decoder.github.com > > >>> > > >>> == Initial Source == > > >>> Current source resides at Github: github.com/joshua-decoder/joshua > > (the > > >>> main decoder and toolkit) and github.com/joshua-decoder/thrax (the > > >>>grammar > > >>> extraction tool). > > >>> > > >>> == External Dependencies == > > >>> Joshua has a number of external dependencies. Only BerkeleyLM (Apache > > >>>2.0) > > >>> and KenLM (LGPG 2.1) are run-time decoder dependencies (one of which > is > > >>> needed for translating sentences with pre-built models). The rest are > > >>> dependencies for the build system and pipeline, used for constructing > > >>>and > > >>> training new models from parallel text. > > >>> > > >>> Apache projects: > > >>> * Ant > > >>> * Hadoop > > >>> * Commons > > >>> * Maven > > >>> * Ivy > > >>> > > >>> There are also a number of other open-source projects with various > > >>> licenses that the project depends on both dynamically (runtime), and > > >>> statically. > > >>> > > >>> === GNU GPL 2 === > > >>> * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/ > > >>> > > >>> === LGPG 2.1 === > > >>> * KenLM: github.com/kpu/kenlm > > >>> > > >>> === Apache 2.0 === > > >>> * BerkeleyLM: https://code.google.com/p/berkeleylm/ > > >>> > > >>> === GNU GPL === > > >>> * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html > > >>> > > >>> == Required Resources == > > >>> * Mailing Lists > > >>> * priv...@joshua.incubator.apache.org > > >>> * d...@joshua.incubator.apache.org > > >>> * comm...@joshua.incubator.apache.org > > >>> > > >>> * Git Repos > > >>> * https://git-wip-us.apache.org/repos/asf/joshua.git > > >>> > > >>> * Issue Tracking > > >>> * JIRA Joshua (JOSHUA) > > >>> > > >>> * Continuous Integration > > >>> * Jenkins builds on https://builds.apache.org/ > > >>> > > >>> * Web > > >>> * http://joshua.incubator.apache.org/ > > >>> * wiki at http://cwiki.apache.org > > >>> > > >>> == Initial Committers == > > >>> The following is a list of the planned initial Apache committers (the > > >>> active subset of the committers for the current repository on > Github). > > >>> > > >>> * Matt Post (p...@cs.jhu.edu) > > >>> * Lewis John McGibbney (lewi...@apache.org) > > >>> * Chris Mattmann (mattm...@apache.org) > > >>> > > >>> == Affiliations == > > >>> > > >>> * Johns Hopkins University > > >>> * Matt Post > > >>> > > >>> * NASA JPL > > >>> * Chris Mattmann > > >>> * Lewis John McGibbney > > >>> > > >>> > > >>> == Sponsors == > > >>> === Champion === > > >>> * Chris Mattmann (NASA/JPL) > > >>> > > >>> === Nominated Mentors === > > >>> * Paul Ramirez > > >>> * Lewis John McGibbney > > >>> * Chris Mattmann > > >>> > > >>> == Sponsoring Entity == > > >>> The Apache Incubator > > >>> > > >>> > > >>> > > >>> > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >>> Chris Mattmann, Ph.D. > > >>> Chief Architect > > >>> Instrument Software and Science Data Systems Section (398) > > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > >>> Office: 168-519, Mailstop: 168-527 > > >>> Email: chris.a.mattm...@nasa.gov > > >>> WWW: http://sunset.usc.edu/~mattmann/ > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >>> Adjunct Associate Professor, Computer Science Department > > >>> University of Southern California, Los Angeles, CA 90089 USA > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >>> > > >>> > > >>> > > >> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > >> For additional commands, e-mail: general-h...@incubator.apache.org > > > > > > > > >--------------------------------------------------------------------- > > >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > >For additional commands, e-mail: general-h...@incubator.apache.org > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > >