[Apertium-stuff] [GSoC2018][Coding period begins on Monday May 14th]

ilnar . salimzianov Sat, 12 May 2018 15:29:57 -0700


Dear Google Summer of Code students,

On behalf of other mentors and myself I want to once again welcome youto the project.


The coding period starts on Monday 14th of May.

Here are some things that we expect from you.

- We expect you to work on your projects for 7-8 hours a day and 35-40hours a week.- If you have an exam or other events and can't keep up with that, youshould tell that your mentor(s) in advance.- Most of the projects, especially projects on language pairs, requireregular communication. We expect you to be on IRC [10] at least for apart ofyour working day, unless your project doesn't require that and yourmentor(s) tell you that you don't have to.

Apertium has moved to Github recently, and it has rather good issuetracking and code review tools (where you can have comments/discussionson specific lines committed), this seems to be a good complement oralternative to discussing things on IRC. Especially if you and yourmentor(s) are on quite distant time zones.

Here is a list of projects that were accepted this year (in noparticular order. In parethesis the IRC nick is given):


Elena Sokur (sokureo)
Udmurt−Komi-Zyrian language pair

Marc Riera Irigoyen (mriera_trad)

Romanian−Catalan language pair and upgrading other pairs to themonolingual module system


Anastasia Kuznetsova (anakuz)
Guarani−Spanish language pair

Sardana Ivanova (eirien)
Kazakh−Sakha language pair

Evgenii Glazunov (G_D)
Bilingual dictionary enrichment via graph completion

Nikolay Aleksandrov (qavan)
Chuvash−Tatar language pair

Abinash Senapati (Techievena)
Extending lttoolbox to have the power of HFST

Claudi Balaguer (capsot)
French−Occitan language pair

Anna Kondratjeva (deltamachine)

Improving language pairs by mining MediaWiki Content Translationpostedits


Vidyadheesha D N (invo)
Kannada−Marathi language pair

Arghya Bhattacharya (arghya)
Python API/library for Apertium

Anna Zueva (zu_ann)
Tatar and Bashkir language pair

Kevin Murphy (kmurphy4)
Universal Dependencies Annotatrix

Oğuzhan Kuyrukçu (oguz)
Uyghur-Turkish MT

You might want to collaborate with each other, especially since severalprojects are on developing new translators and the same kind of taskshave to be solved for each. E.g., figuring out how to measure thecoverage of your translator on Wikipedia or other corpora you'remeasuring it on (we had a discussion about this today, grep forWikiExtractor in today's IRC logs. [1]

By the way, Wikipedia seems to be one of the major uses for Apertiumtranslators, especially since some of these are used by the WikimediaContent Translation Tool. [3] If you are working on a language pair andyou and your mentor decided to select Wikipedia as the domain to focuson (seems to be the case for all pairs this year) I think getting itinstalled in Content Translation tool as early as possible is a veryworthy goal to pursue. For that happen, the translator has to be ofrelease quality, which as a rule means having >95% of coverage and beingtestvoc clean. [8] Getting installed in Wikimedia's Content Translationmeans being able to get feedback from real users (and I think it's alsomuch more rewarding to work on something seeing it to be used byothers!), although in principle you can get such feedback even beforethe pair becomes available there. [9]

If you haven't already done so, we recommend to subscribe to theapertium-stuff mailing list. [2] The general discussions about Apertiumhappen there.

If your are working on a language pair, I also recommend subscribing tothe apertium-packaging mailing list [2], where you will get automaticemails with success/failure build statuses of Apertium packages from thenightly package builder.

In addition, there are mailing lists for discussions concerninglanguages of specific language groups. [2]

Apertium project has started as an MT engine for related languages, andmost of the pairs released so far were for translating betweenclosely-related languages. People working on translators for relatedlanguages of a specific language group tend to flock together, and wehave wiki pages for different language groups, e.g. [3], [4], [5] and[6], where we try to track the progress. I think it's useful to see yourproject as part of a larger effort on building machine translators for awhole family of related languages. Some synergy should arise there :)

The students who already had subscribed to the apertium-stuff mailinglist will receive this email twice, apologies for that.


That's it for now. I wish you all a great summer of coding! :)

Ilnar (selimcan)

[1]http://tinodidriksen.com/pisg/freenode/logs/%23apertium/2018-05-12.log

[2] https://sourceforge.net/p/apertium/mailman/
[3] http://wiki.apertium.org/wiki/Turkic_languages
[4] http://wiki.apertium.org/wiki/Uralic_languages
[5] http://wiki.apertium.org/wiki/Germanic_languages
[6] http://wiki.apertium.org/wiki/Slavic_languages
[7] https://www.mediawiki.org/wiki/Content_translation
[8] http://wiki.apertium.org/wiki/Testvoc
[9] http://wiki.apertium.org/wiki/Evaluating_with_Wikipedia
[10] http://wiki.apertium.org/wiki/Irc

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] [GSoC2018][Coding period begins on Monday May 14th]

Reply via email to