You're making the presumption its passed its vote! ;) On Fri, Feb 12, 2016 at 7:33 PM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote:
> Yep, will send a result shortly. > > Lewis, after that, can you help me get the podling bootstrap tasks > started? > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > -----Original Message----- > From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> > Reply-To: "general@incubator.apache.org" <general@incubator.apache.org> > Date: Friday, February 12, 2016 at 11:31 AM > To: "general@incubator.apache.org" <general@incubator.apache.org> > Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling > > >Hi Chris, > >Is it time to close out this VOTE and bring Joshua on board? > >Lewis > > > >On Wed, Feb 3, 2016 at 4:01 PM, <general-digest-h...@incubator.apache.org > > > >wrote: > > > >> > >> From: Danese Cooper <dan...@gmail.com> > >> To: "general@incubator.apache.org" <general@incubator.apache.org> > >> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> > >> Date: Wed, 3 Feb 2016 07:43:11 -0800 > >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling > >> +1 (binding) Accept Joshua as an Apache Incubator podling. > >> > >> D > >> > >> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) < > >> chris.a.mattm...@jpl.nasa.gov> wrote: > >> > > >> > Hi Everyone, > >> > > >> > OK the discussion is now completed. Please VOTE to accept Joshua > >> > into the Apache Incubator. I’ll leave the VOTE open for at least > >> > the next 72 hours, with hopes to close it next Friday the 5th of > >> > February, 2016. > >> > > >> > [ ] +1 Accept Joshua as an Apache Incubator podling. > >> > [ ] +0 Abstain. > >> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling because.. > >> > > >> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC > >> > members are binding but all are welcome to VOTE! > >> > > >> > Cheers, > >> > Chris > >> > > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > Chris Mattmann, Ph.D. > >> > Chief Architect > >> > Instrument Software and Science Data Systems Section (398) > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> > Office: 168-519, Mailstop: 168-527 > >> > Email: chris.a.mattm...@nasa.gov > >> > WWW: http://sunset.usc.edu/~mattmann/ > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > Adjunct Associate Professor, Computer Science Department > >> > University of Southern California, Los Angeles, CA 90089 USA > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > > >> > > >> > > >> > > >> > > >> > -----Original Message----- > >> > From: jpluser <chris.a.mattm...@jpl.nasa.gov> > >> > Date: Tuesday, January 12, 2016 at 10:56 PM > >> > To: "general@incubator.apache.org" <general@incubator.apache.org> > >> > Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> > >> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine > >>Translation > >> > Toolkit > >> > > >> >> Hi Everyone, > >> >> > >> >> Please find attached for your viewing pleasure a proposed new > >>project, > >> >> Apache Joshua, a statistical machine translation toolkit. The > >>proposal > >> >> is in wiki draft form at: > >> https://wiki.apache.org/incubator/JoshuaProposal > >> >> > >> >> Proposal text is copied below. I’ll leave the discussion open for a > >> week > >> >> and we are interested in folks who would like to be initial > >>committers > >> >> and mentors. Please discuss here on the thread. > >> >> > >> >> Thanks! > >> >> > >> >> Cheers, > >> >> Chris (Champion) > >> >> > >> >> ——— > >> >> > >> >> = Joshua Proposal = > >> >> > >> >> == Abstract == > >> >> [[joshua-decoder.org|Joshua]] is an open-source statistical machine > >> >> translation toolkit. It includes a Java-based decoder for translating > >> with > >> >> phrase-based, hierarchical, and syntax-based translation models, a > >> >> Hadoop-based grammar extractor (Thrax), and an extensive set of tools > >> and > >> >> scripts for training and evaluating new models from parallel text. > >> >> > >> >> == Proposal == > >> >> Joshua is a state of the art statistical machine translation system > >>that > >> >> provides a number of features: > >> >> > >> >> * Support for the two main paradigms in statistical machine > >>translation: > >> >> phrase-based and hierarchical / syntactic. > >> >> * A sparse feature API that makes it easy to add new feature > >>templates > >> >> supporting millions of features > >> >> * Native implementations of many tuners (MERT, MIRA, PRO, and > >>AdaGrad) > >> >> * Support for lattice decoding, allowing upstream NLP tools to expose > >> >> their hypothesis space to the MT system > >> >> * An efficient representation for models, allowing for quick loading > >>of > >> >> multi-gigabyte model files > >> >> * Fast decoding speed (on par with Moses and mtplz) > >> >> * Language packs — precompiled models that allow the decoder to be > >> run as > >> >> a black box > >> >> * Thrax, a Hadoop-based tool for learning translation models from > >> >> parallel text > >> >> * A suite of tools for constructing new models for any language pair > >>for > >> >> which sufficient training data exists > >> >> > >> >> == Background and Rationale == > >> >> A number of factors make this a good time for an Apache project > >>focused > >> on > >> >> machine translation (MT): the quality of MT output (for many language > >> >> pairs); the average computing resources available on computers, > >>relative > >> >> to the needs of MT systems; and the availability of a number of > >> >> high-quality toolkits, together with a large base of researchers > >>working > >> >> on them. > >> >> > >> >> Over the past decade, machine translation (MT; the automatic > >>translation > >> >> of one human language to another) has become a reality. The research > >> into > >> >> statistical approaches to translation that began in the early > >>nineties, > >> >> together with the availability of large amounts of training data, and > >> >> better computing infrastructure, have all come together to produce > >> >> translations results that are “good enough†for a large set of > >> language > >> >> pairs and use cases. Free services like > >> >> [[https://www.bing.com/translator|Bing Translator]] and > >> >> [[https://translate.google.com|Google Translate]] have made these > >> services > >> >> available to the average person through direct interfaces and through > >> >> tools like browser plugins, and sites across the world with higher > >> >> translation needs use them to translate their pages through > >> automatically. > >> >> > >> >> MT does not require the infrastructure of large corporations in > >>order to > >> >> produce feasible output. Machine translation can be > >>resource-intensive, > >> >> but need not be prohibitively so. Disk and memory usage are mostly a > >> >> matter of model size, which for most language pairs is a few > >>gigabytes > >> at > >> >> most, at which size models can provide coverage on the order of tens > >>or > >> >> even hundreds of thousands of words in the input and output > >>languages. > >> The > >> >> computational complexity of the algorithms used to search for > >> translations > >> >> of new sentences are typically linear in the number of words in the > >> input > >> >> sentence, making it possible to run a translation engine on a > >>personal > >> >> computer. > >> >> > >> >> The research community has produced many different open source > >> translation > >> >> projects for a range of programming languages and under a variety of > >> >> licenses. These projects include the core “decoder†, which takes > >>a > >> model > >> >> and uses it to translate new sentences between the language pair the > >> model > >> >> was defined for. They also typically include a large set of tools > >>that > >> >> enable new models to be built from large sets of example translations > >> >> (“parallel data†) and monolingual texts. These toolkits are > >>usually > >> built > >> >> to support the agendas of the (largely) academic researchers that > >>build > >> >> them: the repeated cycle of building new models, tuning model > >>parameters > >> >> against development data, and evaluating them against held-out test > >> data, > >> >> using standard metrics for testing the quality of MT output. > >> >> > >> >> Together, these three factors—the quality of machine translation > >> output, > >> >> the feasibility of translating on standard computers, and the > >> availability > >> >> of tools to build models—make it reasonable for the end users to > >>use > >> MT as > >> >> a black-box service, and to run it on their personal machine. > >> >> > >> >> These factors make it a good time for an organization with the > >>status of > >> >> the Apache Foundation to host a machine translation project. > >> >> > >> >> == Current Status == > >> >> Joshua was originally ported from David Chiang’s Python > >> implementation of > >> >> Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins > >> >> University. The current version is maintained by Matt Post at Johns > >> >> Hopkins’ Human Language Technology Center of Excellence. Joshua has > >> made > >> >> many releases with a list of over 20 source code tags. The last > >>release > >> of > >> >> Joshua was 6.0.5 on November 5th, 2015. > >> >> > >> >> == Meritocracy == > >> >> The current developers are familiar with meritocratic open source > >> >> development at Apache. Apache was chosen specifically because we > >>want to > >> >> encourage this style of development for the project. > >> >> > >> >> == Community == > >> >> Joshua is used widely across the world. Perhaps its biggest (known) > >> >> research / industrial user is the Amazon research group in Berlin. > >> Another > >> >> user is the US Army Research Lab. No formal census has been > >>undertaken, > >> >> but posts to the Joshua technical support mailing list, along with > >>the > >> >> occasional contributions, suggest small research and academic > >> communities > >> >> spread across the world, many of them in India. > >> >> > >> >> During incubation, we will explicitly seek to increase our usage > >>across > >> >> the board, including academic research, industry, and other end users > >> >> interested in statistical machine translation. > >> >> > >> >> == Core Developers == > >> >> The current set of core developers is fairly small, having fallen > >>with > >> the > >> >> graduation from Johns Hopkins of some core student participants. > >> However, > >> >> Joshua is used fairly widely, as mentioned above, and there remains a > >> >> commitment from the principal researcher at Johns Hopkins to > >>continue to > >> >> use and develop it. Joshua has seen a number of new community members > >> >> become interested recently due to a potential for its projected use > >>in a > >> >> number of ongoing DARPA projects such as XDATA and Memex. > >> >> > >> >> == Alignment == > >> >> Joshua is currently Copyright (c) 2015, Johns Hopkins University All > >> >> rights reserved and licensed under BSD 2-clause license. It would of > >> >> course be the intention to relicense this code under AL2.0 which > >>would > >> >> permit expanded and increased use of the software within Apache > >> projects. > >> >> There is currently an ongoing effort within the Apache Tika > >>community to > >> >> utilize Joshua within Tika’s Translate API, see > >> >> [[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]]. > >> >> > >> >> == Known Risks == > >> >> > >> >> === Orphaned products === > >> >> At the moment, regular contributions are made by a single > >>contributor, > >> the > >> >> lead maintainer. He (Matt Post) plans to continue development for the > >> next > >> >> few years, but it is still a single point of failure, since the > >>graduate > >> >> students who worked on the project have moved on to jobs, mostly in > >> >> industry. However, our goal is to help that process by growing the > >> >> community in Apache, and at least in growing the community with users > >> and > >> >> participants from NASA JPL. > >> >> > >> >> === Inexperience with Open Source === > >> >> The team both at Johns Hopkins and NASA JPL have experience with many > >> OSS > >> >> software projects at Apache and elsewhere. We understand "how it > >>works" > >> >> here at the foundation. > >> >> > >> >> > >> >> == Relationships with Other Apache Products == > >> >> Joshua includes dependences on Hadoop, and also is included as a > >>plugin > >> in > >> >> Apache Tika. We are also interested in coordinating with other > >>projects > >> >> including Spark, and other projects needing MT services for language > >> >> translation. > >> >> > >> >> == Developers == > >> >> Joshua only has one regular developer who is employed by Johns > >>Hopkins > >> >> University. NASA JPL (Mattmann and McGibbney) have been contributing > >> >> lately including a Brew formula and other contributions to the > >>project > >> >> through the DARPA XDATA and Memex programs. > >> >> > >> >> == Documentation == > >> >> Documentation and publications related to Joshua can be found at > >> >> joshua-decoder.org. The source for the Joshua documentation is > >> currently > >> >> hosted on Github at > >> >> https://github.com/joshua-decoder/joshua-decoder.github.com > >> >> > >> >> == Initial Source == > >> >> Current source resides at Github: github.com/joshua-decoder/joshua > >>(the > >> >> main decoder and toolkit) and github.com/joshua-decoder/thrax (the > >> grammar > >> >> extraction tool). > >> >> > >> >> == External Dependencies == > >> >> Joshua has a number of external dependencies. Only BerkeleyLM (Apache > >> 2.0) > >> >> and KenLM (LGPG 2.1) are run-time decoder dependencies (one of which > >>is > >> >> needed for translating sentences with pre-built models). The rest are > >> >> dependencies for the build system and pipeline, used for constructing > >> and > >> >> training new models from parallel text. > >> >> > >> >> Apache projects: > >> >> * Ant > >> >> * Hadoop > >> >> * Commons > >> >> * Maven > >> >> * Ivy > >> >> > >> >> There are also a number of other open-source projects with various > >> >> licenses that the project depends on both dynamically (runtime), and > >> >> statically. > >> >> > >> >> === GNU GPL 2 === > >> >> * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/ > >> >> > >> >> === LGPG 2.1 === > >> >> * KenLM: github.com/kpu/kenlm > >> >> > >> >> === Apache 2.0 === > >> >> * BerkeleyLM: https://code.google.com/p/berkeleylm/ > >> >> > >> >> === GNU GPL === > >> >> * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html > >> >> > >> >> == Required Resources == > >> >> * Mailing Lists > >> >> * priv...@joshua.incubator.apache.org > >> >> * d...@joshua.incubator.apache.org > >> >> * comm...@joshua.incubator.apache.org > >> >> > >> >> * Git Repos > >> >> * https://git-wip-us.apache.org/repos/asf/joshua.git > >> >> > >> >> * Issue Tracking > >> >> * JIRA Joshua (JOSHUA) > >> >> > >> >> * Continuous Integration > >> >> * Jenkins builds on https://builds.apache.org/ > >> >> > >> >> * Web > >> >> * http://joshua.incubator.apache.org/ > >> >> * wiki at http://cwiki.apache.org > >> >> > >> >> == Initial Committers == > >> >> The following is a list of the planned initial Apache committers (the > >> >> active subset of the committers for the current repository on > >>Github). > >> >> > >> >> * Matt Post (p...@cs.jhu.edu) > >> >> * Lewis John McGibbney (lewi...@apache.org) > >> >> * Chris Mattmann (mattm...@apache.org) > >> >> > >> >> == Affiliations == > >> >> > >> >> * Johns Hopkins University > >> >> * Matt Post > >> >> > >> >> * NASA JPL > >> >> * Chris Mattmann > >> >> * Lewis John McGibbney > >> >> > >> >> > >> >> == Sponsors == > >> >> === Champion === > >> >> * Chris Mattmann (NASA/JPL) > >> >> > >> >> === Nominated Mentors === > >> >> * Paul Ramirez > >> >> * Lewis John McGibbney > >> >> * Chris Mattmann > >> >> > >> >> == Sponsoring Entity == > >> >> The Apache Incubator > >> >> > >> >> > >> >> > >> >> > >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >> Chris Mattmann, Ph.D. > >> >> Chief Architect > >> >> Instrument Software and Science Data Systems Section (398) > >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >> Office: 168-519, Mailstop: 168-527 > >> >> Email: chris.a.mattm...@nasa.gov > >> >> WWW: http://sunset.usc.edu/~mattmann/ > >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >> Adjunct Associate Professor, Computer Science Department > >> >> University of Southern California, Los Angeles, CA 90089 USA > >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >