Dear Google Summer of Code students,

On behalf of other mentors and myself I want to once again welcome you to the project.

The coding period starts on Monday 14th of May.

Here are some things that we expect from you.

- We expect you to work on your projects for 7-8 hours a day and 35-40 hours a week. - If you have an exam or other events and can't keep up with that, you should tell that your mentor(s) in advance. - Most of the projects, especially projects on language pairs, require regular communication. We expect you to be on IRC [10] at least for a part of your working day, unless your project doesn't require that and your mentor(s) tell you that you don't have to.

Apertium has moved to Github recently, and it has rather good issue tracking and code review tools (where you can have comments/discussions on specific lines committed), this seems to be a good complement or alternative to discussing things on IRC. Especially if you and your mentor(s) are on quite distant time zones.

Here is a list of projects that were accepted this year (in no particular order. In parethesis the IRC nick is given):

Elena Sokur (sokureo)
Udmurt−Komi-Zyrian language pair

Marc Riera Irigoyen (mriera_trad)
Romanian−Catalan language pair and upgrading other pairs to the monolingual module system

Anastasia Kuznetsova (anakuz)
Guarani−Spanish language pair

Sardana Ivanova (eirien)
Kazakh−Sakha language pair

Evgenii Glazunov (G_D)
Bilingual dictionary enrichment via graph completion

Nikolay Aleksandrov (qavan)
Chuvash−Tatar language pair

Abinash Senapati (Techievena)
Extending lttoolbox to have the power of HFST

Claudi Balaguer (capsot)
French−Occitan language pair

Anna Kondratjeva (deltamachine)
Improving language pairs by mining MediaWiki Content Translation postedits

Vidyadheesha D N (invo)
Kannada−Marathi language pair

Arghya Bhattacharya (arghya)
Python API/library for Apertium

Anna Zueva (zu_ann)
Tatar and Bashkir language pair

Kevin Murphy (kmurphy4)
Universal Dependencies Annotatrix

Oğuzhan Kuyrukçu (oguz)
Uyghur-Turkish MT

You might want to collaborate with each other, especially since several projects are on developing new translators and the same kind of tasks have to be solved for each. E.g., figuring out how to measure the coverage of your translator on Wikipedia or other corpora you're measuring it on (we had a discussion about this today, grep for WikiExtractor in today's IRC logs. [1]

By the way, Wikipedia seems to be one of the major uses for Apertium translators, especially since some of these are used by the Wikimedia Content Translation Tool. [3] If you are working on a language pair and you and your mentor decided to select Wikipedia as the domain to focus on (seems to be the case for all pairs this year) I think getting it installed in Content Translation tool as early as possible is a very worthy goal to pursue. For that happen, the translator has to be of release quality, which as a rule means having >95% of coverage and being testvoc clean. [8] Getting installed in Wikimedia's Content Translation means being able to get feedback from real users (and I think it's also much more rewarding to work on something seeing it to be used by others!), although in principle you can get such feedback even before the pair becomes available there. [9]

If you haven't already done so, we recommend to subscribe to the apertium-stuff mailing list. [2] The general discussions about Apertium happen there.

If your are working on a language pair, I also recommend subscribing to the apertium-packaging mailing list [2], where you will get automatic emails with success/failure build statuses of Apertium packages from the nightly package builder.

In addition, there are mailing lists for discussions concerning languages of specific language groups. [2]

Apertium project has started as an MT engine for related languages, and most of the pairs released so far were for translating between closely-related languages. People working on translators for related languages of a specific language group tend to flock together, and we have wiki pages for different language groups, e.g. [3], [4], [5] and [6], where we try to track the progress. I think it's useful to see your project as part of a larger effort on building machine translators for a whole family of related languages. Some synergy should arise there :)

The students who already had subscribed to the apertium-stuff mailing list will receive this email twice, apologies for that.

That's it for now. I wish you all a great summer of coding! :)

Ilnar (selimcan)

[1] http://tinodidriksen.com/pisg/freenode/logs/%23apertium/2018-05-12.log
[2] https://sourceforge.net/p/apertium/mailman/
[3] http://wiki.apertium.org/wiki/Turkic_languages
[4] http://wiki.apertium.org/wiki/Uralic_languages
[5] http://wiki.apertium.org/wiki/Germanic_languages
[6] http://wiki.apertium.org/wiki/Slavic_languages
[7] https://www.mediawiki.org/wiki/Content_translation
[8] http://wiki.apertium.org/wiki/Testvoc
[9] http://wiki.apertium.org/wiki/Evaluating_with_Wikipedia
[10] http://wiki.apertium.org/wiki/Irc

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to