Is there a fail grade? ;)
On 2/12/16, 11:57 AM, "Tom Barber" <tom.bar...@meteorite.bi> wrote: >You're making the presumption its passed its vote! ;) > >On Fri, Feb 12, 2016 at 7:33 PM, Mattmann, Chris A (3980) < >chris.a.mattm...@jpl.nasa.gov> wrote: > >> Yep, will send a result shortly. >> >> Lewis, after that, can you help me get the podling bootstrap tasks >> started? >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> -----Original Message----- >> From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> >> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org> >> Date: Friday, February 12, 2016 at 11:31 AM >> To: "general@incubator.apache.org" <general@incubator.apache.org> >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling >> >> >Hi Chris, >> >Is it time to close out this VOTE and bring Joshua on board? >> >Lewis >> > >> >On Wed, Feb 3, 2016 at 4:01 PM, >><general-digest-h...@incubator.apache.org >> > >> >wrote: >> > >> >> >> >> From: Danese Cooper <dan...@gmail.com> >> >> To: "general@incubator.apache.org" <general@incubator.apache.org> >> >> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> >> >> Date: Wed, 3 Feb 2016 07:43:11 -0800 >> >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling >> >> +1 (binding) Accept Joshua as an Apache Incubator podling. >> >> >> >> D >> >> >> >> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) < >> >> chris.a.mattm...@jpl.nasa.gov> wrote: >> >> > >> >> > Hi Everyone, >> >> > >> >> > OK the discussion is now completed. Please VOTE to accept Joshua >> >> > into the Apache Incubator. I’ll leave the VOTE open for at least >> >> > the next 72 hours, with hopes to close it next Friday the 5th of >> >> > February, 2016. >> >> > >> >> > [ ] +1 Accept Joshua as an Apache Incubator podling. >> >> > [ ] +0 Abstain. >> >> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling >>because.. >> >> > >> >> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC >> >> > members are binding but all are welcome to VOTE! >> >> > >> >> > Cheers, >> >> > Chris >> >> > >> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> > Chris Mattmann, Ph.D. >> >> > Chief Architect >> >> > Instrument Software and Science Data Systems Section (398) >> >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> > Office: 168-519, Mailstop: 168-527 >> >> > Email: chris.a.mattm...@nasa.gov >> >> > WWW: http://sunset.usc.edu/~mattmann/ >> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> > Adjunct Associate Professor, Computer Science Department >> >> > University of Southern California, Los Angeles, CA 90089 USA >> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -----Original Message----- >> >> > From: jpluser <chris.a.mattm...@jpl.nasa.gov> >> >> > Date: Tuesday, January 12, 2016 at 10:56 PM >> >> > To: "general@incubator.apache.org" <general@incubator.apache.org> >> >> > Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> >> >> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine >> >>Translation >> >> > Toolkit >> >> > >> >> >> Hi Everyone, >> >> >> >> >> >> Please find attached for your viewing pleasure a proposed new >> >>project, >> >> >> Apache Joshua, a statistical machine translation toolkit. The >> >>proposal >> >> >> is in wiki draft form at: >> >> https://wiki.apache.org/incubator/JoshuaProposal >> >> >> >> >> >> Proposal text is copied below. I’ll leave the discussion open >>for a >> >> week >> >> >> and we are interested in folks who would like to be initial >> >>committers >> >> >> and mentors. Please discuss here on the thread. >> >> >> >> >> >> Thanks! >> >> >> >> >> >> Cheers, >> >> >> Chris (Champion) >> >> >> >> >> >> ——— >> >> >> >> >> >> = Joshua Proposal = >> >> >> >> >> >> == Abstract == >> >> >> [[joshua-decoder.org|Joshua]] is an open-source statistical >>machine >> >> >> translation toolkit. It includes a Java-based decoder for >>translating >> >> with >> >> >> phrase-based, hierarchical, and syntax-based translation models, a >> >> >> Hadoop-based grammar extractor (Thrax), and an extensive set of >>tools >> >> and >> >> >> scripts for training and evaluating new models from parallel text. >> >> >> >> >> >> == Proposal == >> >> >> Joshua is a state of the art statistical machine translation >>system >> >>that >> >> >> provides a number of features: >> >> >> >> >> >> * Support for the two main paradigms in statistical machine >> >>translation: >> >> >> phrase-based and hierarchical / syntactic. >> >> >> * A sparse feature API that makes it easy to add new feature >> >>templates >> >> >> supporting millions of features >> >> >> * Native implementations of many tuners (MERT, MIRA, PRO, and >> >>AdaGrad) >> >> >> * Support for lattice decoding, allowing upstream NLP tools to >>expose >> >> >> their hypothesis space to the MT system >> >> >> * An efficient representation for models, allowing for quick >>loading >> >>of >> >> >> multi-gigabyte model files >> >> >> * Fast decoding speed (on par with Moses and mtplz) >> >> >> * Language packs — precompiled models that allow the decoder to >>be >> >> run as >> >> >> a black box >> >> >> * Thrax, a Hadoop-based tool for learning translation models from >> >> >> parallel text >> >> >> * A suite of tools for constructing new models for any language >>pair >> >>for >> >> >> which sufficient training data exists >> >> >> >> >> >> == Background and Rationale == >> >> >> A number of factors make this a good time for an Apache project >> >>focused >> >> on >> >> >> machine translation (MT): the quality of MT output (for many >>language >> >> >> pairs); the average computing resources available on computers, >> >>relative >> >> >> to the needs of MT systems; and the availability of a number of >> >> >> high-quality toolkits, together with a large base of researchers >> >>working >> >> >> on them. >> >> >> >> >> >> Over the past decade, machine translation (MT; the automatic >> >>translation >> >> >> of one human language to another) has become a reality. The >>research >> >> into >> >> >> statistical approaches to translation that began in the early >> >>nineties, >> >> >> together with the availability of large amounts of training data, >>and >> >> >> better computing infrastructure, have all come together to produce >> >> >> translations results that are “good enough†for a large set of >> >> language >> >> >> pairs and use cases. Free services like >> >> >> [[https://www.bing.com/translator|Bing Translator]] and >> >> >> [[https://translate.google.com|Google Translate]] have made these >> >> services >> >> >> available to the average person through direct interfaces and >>through >> >> >> tools like browser plugins, and sites across the world with higher >> >> >> translation needs use them to translate their pages through >> >> automatically. >> >> >> >> >> >> MT does not require the infrastructure of large corporations in >> >>order to >> >> >> produce feasible output. Machine translation can be >> >>resource-intensive, >> >> >> but need not be prohibitively so. Disk and memory usage are >>mostly a >> >> >> matter of model size, which for most language pairs is a few >> >>gigabytes >> >> at >> >> >> most, at which size models can provide coverage on the order of >>tens >> >>or >> >> >> even hundreds of thousands of words in the input and output >> >>languages. >> >> The >> >> >> computational complexity of the algorithms used to search for >> >> translations >> >> >> of new sentences are typically linear in the number of words in >>the >> >> input >> >> >> sentence, making it possible to run a translation engine on a >> >>personal >> >> >> computer. >> >> >> >> >> >> The research community has produced many different open source >> >> translation >> >> >> projects for a range of programming languages and under a variety >>of >> >> >> licenses. These projects include the core “decoder†, which >>takes >> >>a >> >> model >> >> >> and uses it to translate new sentences between the language pair >>the >> >> model >> >> >> was defined for. They also typically include a large set of tools >> >>that >> >> >> enable new models to be built from large sets of example >>translations >> >> >> (“parallel data†) and monolingual texts. These toolkits are >> >>usually >> >> built >> >> >> to support the agendas of the (largely) academic researchers that >> >>build >> >> >> them: the repeated cycle of building new models, tuning model >> >>parameters >> >> >> against development data, and evaluating them against held-out >>test >> >> data, >> >> >> using standard metrics for testing the quality of MT output. >> >> >> >> >> >> Together, these three factors—the quality of machine translation >> >> output, >> >> >> the feasibility of translating on standard computers, and the >> >> availability >> >> >> of tools to build models—make it reasonable for the end users to >> >>use >> >> MT as >> >> >> a black-box service, and to run it on their personal machine. >> >> >> >> >> >> These factors make it a good time for an organization with the >> >>status of >> >> >> the Apache Foundation to host a machine translation project. >> >> >> >> >> >> == Current Status == >> >> >> Joshua was originally ported from David Chiang’s Python >> >> implementation of >> >> >> Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins >> >> >> University. The current version is maintained by Matt Post at >>Johns >> >> >> Hopkins’ Human Language Technology Center of Excellence. Joshua >>has >> >> made >> >> >> many releases with a list of over 20 source code tags. The last >> >>release >> >> of >> >> >> Joshua was 6.0.5 on November 5th, 2015. >> >> >> >> >> >> == Meritocracy == >> >> >> The current developers are familiar with meritocratic open source >> >> >> development at Apache. Apache was chosen specifically because we >> >>want to >> >> >> encourage this style of development for the project. >> >> >> >> >> >> == Community == >> >> >> Joshua is used widely across the world. Perhaps its biggest >>(known) >> >> >> research / industrial user is the Amazon research group in Berlin. >> >> Another >> >> >> user is the US Army Research Lab. No formal census has been >> >>undertaken, >> >> >> but posts to the Joshua technical support mailing list, along with >> >>the >> >> >> occasional contributions, suggest small research and academic >> >> communities >> >> >> spread across the world, many of them in India. >> >> >> >> >> >> During incubation, we will explicitly seek to increase our usage >> >>across >> >> >> the board, including academic research, industry, and other end >>users >> >> >> interested in statistical machine translation. >> >> >> >> >> >> == Core Developers == >> >> >> The current set of core developers is fairly small, having fallen >> >>with >> >> the >> >> >> graduation from Johns Hopkins of some core student participants. >> >> However, >> >> >> Joshua is used fairly widely, as mentioned above, and there >>remains a >> >> >> commitment from the principal researcher at Johns Hopkins to >> >>continue to >> >> >> use and develop it. Joshua has seen a number of new community >>members >> >> >> become interested recently due to a potential for its projected >>use >> >>in a >> >> >> number of ongoing DARPA projects such as XDATA and Memex. >> >> >> >> >> >> == Alignment == >> >> >> Joshua is currently Copyright (c) 2015, Johns Hopkins University >>All >> >> >> rights reserved and licensed under BSD 2-clause license. It would >>of >> >> >> course be the intention to relicense this code under AL2.0 which >> >>would >> >> >> permit expanded and increased use of the software within Apache >> >> projects. >> >> >> There is currently an ongoing effort within the Apache Tika >> >>community to >> >> >> utilize Joshua within Tika’s Translate API, see >> >> >> [[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]]. >> >> >> >> >> >> == Known Risks == >> >> >> >> >> >> === Orphaned products === >> >> >> At the moment, regular contributions are made by a single >> >>contributor, >> >> the >> >> >> lead maintainer. He (Matt Post) plans to continue development for >>the >> >> next >> >> >> few years, but it is still a single point of failure, since the >> >>graduate >> >> >> students who worked on the project have moved on to jobs, mostly >>in >> >> >> industry. However, our goal is to help that process by growing the >> >> >> community in Apache, and at least in growing the community with >>users >> >> and >> >> >> participants from NASA JPL. >> >> >> >> >> >> === Inexperience with Open Source === >> >> >> The team both at Johns Hopkins and NASA JPL have experience with >>many >> >> OSS >> >> >> software projects at Apache and elsewhere. We understand "how it >> >>works" >> >> >> here at the foundation. >> >> >> >> >> >> >> >> >> == Relationships with Other Apache Products == >> >> >> Joshua includes dependences on Hadoop, and also is included as a >> >>plugin >> >> in >> >> >> Apache Tika. We are also interested in coordinating with other >> >>projects >> >> >> including Spark, and other projects needing MT services for >>language >> >> >> translation. >> >> >> >> >> >> == Developers == >> >> >> Joshua only has one regular developer who is employed by Johns >> >>Hopkins >> >> >> University. NASA JPL (Mattmann and McGibbney) have been >>contributing >> >> >> lately including a Brew formula and other contributions to the >> >>project >> >> >> through the DARPA XDATA and Memex programs. >> >> >> >> >> >> == Documentation == >> >> >> Documentation and publications related to Joshua can be found at >> >> >> joshua-decoder.org. The source for the Joshua documentation is >> >> currently >> >> >> hosted on Github at >> >> >> https://github.com/joshua-decoder/joshua-decoder.github.com >> >> >> >> >> >> == Initial Source == >> >> >> Current source resides at Github: github.com/joshua-decoder/joshua >> >>(the >> >> >> main decoder and toolkit) and github.com/joshua-decoder/thrax (the >> >> grammar >> >> >> extraction tool). >> >> >> >> >> >> == External Dependencies == >> >> >> Joshua has a number of external dependencies. Only BerkeleyLM >>(Apache >> >> 2.0) >> >> >> and KenLM (LGPG 2.1) are run-time decoder dependencies (one of >>which >> >>is >> >> >> needed for translating sentences with pre-built models). The rest >>are >> >> >> dependencies for the build system and pipeline, used for >>constructing >> >> and >> >> >> training new models from parallel text. >> >> >> >> >> >> Apache projects: >> >> >> * Ant >> >> >> * Hadoop >> >> >> * Commons >> >> >> * Maven >> >> >> * Ivy >> >> >> >> >> >> There are also a number of other open-source projects with various >> >> >> licenses that the project depends on both dynamically (runtime), >>and >> >> >> statically. >> >> >> >> >> >> === GNU GPL 2 === >> >> >> * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/ >> >> >> >> >> >> === LGPG 2.1 === >> >> >> * KenLM: github.com/kpu/kenlm >> >> >> >> >> >> === Apache 2.0 === >> >> >> * BerkeleyLM: https://code.google.com/p/berkeleylm/ >> >> >> >> >> >> === GNU GPL === >> >> >> * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html >> >> >> >> >> >> == Required Resources == >> >> >> * Mailing Lists >> >> >> * priv...@joshua.incubator.apache.org >> >> >> * d...@joshua.incubator.apache.org >> >> >> * comm...@joshua.incubator.apache.org >> >> >> >> >> >> * Git Repos >> >> >> * https://git-wip-us.apache.org/repos/asf/joshua.git >> >> >> >> >> >> * Issue Tracking >> >> >> * JIRA Joshua (JOSHUA) >> >> >> >> >> >> * Continuous Integration >> >> >> * Jenkins builds on https://builds.apache.org/ >> >> >> >> >> >> * Web >> >> >> * http://joshua.incubator.apache.org/ >> >> >> * wiki at http://cwiki.apache.org >> >> >> >> >> >> == Initial Committers == >> >> >> The following is a list of the planned initial Apache committers >>(the >> >> >> active subset of the committers for the current repository on >> >>Github). >> >> >> >> >> >> * Matt Post (p...@cs.jhu.edu) >> >> >> * Lewis John McGibbney (lewi...@apache.org) >> >> >> * Chris Mattmann (mattm...@apache.org) >> >> >> >> >> >> == Affiliations == >> >> >> >> >> >> * Johns Hopkins University >> >> >> * Matt Post >> >> >> >> >> >> * NASA JPL >> >> >> * Chris Mattmann >> >> >> * Lewis John McGibbney >> >> >> >> >> >> >> >> >> == Sponsors == >> >> >> === Champion === >> >> >> * Chris Mattmann (NASA/JPL) >> >> >> >> >> >> === Nominated Mentors === >> >> >> * Paul Ramirez >> >> >> * Lewis John McGibbney >> >> >> * Chris Mattmann >> >> >> >> >> >> == Sponsoring Entity == >> >> >> The Apache Incubator >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> Chris Mattmann, Ph.D. >> >> >> Chief Architect >> >> >> Instrument Software and Science Data Systems Section (398) >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> >> Office: 168-519, Mailstop: 168-527 >> >> >> Email: chris.a.mattm...@nasa.gov >> >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> Adjunct Associate Professor, Computer Science Department >> >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>