Dear Google Summer of Code students,
On behalf of other mentors and myself I want to once again welcome you
to the project.
The coding period starts on Monday 14th of May.
Here are some things that we expect from you.
- We expect you to work on your projects for 7-8 hours a day and 35-40
hours a week.
- If you have an exam or other events and can't keep up with that, you
should tell that your mentor(s) in advance.
- Most of the projects, especially projects on language pairs, require
regular communication. We expect you to be on IRC [10] at least for a
part of
your working day, unless your project doesn't require that and your
mentor(s) tell you that you don't have to.
Apertium has moved to Github recently, and it has rather good issue
tracking and code review tools (where you can have comments/discussions
on specific lines committed), this seems to be a good complement or
alternative to discussing things on IRC. Especially if you and your
mentor(s) are on quite distant time zones.
Here is a list of projects that were accepted this year (in no
particular order. In parethesis the IRC nick is given):
Elena Sokur (sokureo)
Udmurt−Komi-Zyrian language pair
Marc Riera Irigoyen (mriera_trad)
Romanian−Catalan language pair and upgrading other pairs to the
monolingual module system
Anastasia Kuznetsova (anakuz)
Guarani−Spanish language pair
Sardana Ivanova (eirien)
Kazakh−Sakha language pair
Evgenii Glazunov (G_D)
Bilingual dictionary enrichment via graph completion
Nikolay Aleksandrov (qavan)
Chuvash−Tatar language pair
Abinash Senapati (Techievena)
Extending lttoolbox to have the power of HFST
Claudi Balaguer (capsot)
French−Occitan language pair
Anna Kondratjeva (deltamachine)
Improving language pairs by mining MediaWiki Content Translation
postedits
Vidyadheesha D N (invo)
Kannada−Marathi language pair
Arghya Bhattacharya (arghya)
Python API/library for Apertium
Anna Zueva (zu_ann)
Tatar and Bashkir language pair
Kevin Murphy (kmurphy4)
Universal Dependencies Annotatrix
Oğuzhan Kuyrukçu (oguz)
Uyghur-Turkish MT
You might want to collaborate with each other, especially since several
projects are on developing new translators and the same kind of tasks
have to be solved for each. E.g., figuring out how to measure the
coverage of your translator on Wikipedia or other corpora you're
measuring it on (we had a discussion about this today, grep for
WikiExtractor in today's IRC logs. [1]
By the way, Wikipedia seems to be one of the major uses for Apertium
translators, especially since some of these are used by the Wikimedia
Content Translation Tool. [3] If you are working on a language pair and
you and your mentor decided to select Wikipedia as the domain to focus
on (seems to be the case for all pairs this year) I think getting it
installed in Content Translation tool as early as possible is a very
worthy goal to pursue. For that happen, the translator has to be of
release quality, which as a rule means having >95% of coverage and being
testvoc clean. [8] Getting installed in Wikimedia's Content Translation
means being able to get feedback from real users (and I think it's also
much more rewarding to work on something seeing it to be used by
others!), although in principle you can get such feedback even before
the pair becomes available there. [9]
If you haven't already done so, we recommend to subscribe to the
apertium-stuff mailing list. [2] The general discussions about Apertium
happen there.
If your are working on a language pair, I also recommend subscribing to
the apertium-packaging mailing list [2], where you will get automatic
emails with success/failure build statuses of Apertium packages from the
nightly package builder.
In addition, there are mailing lists for discussions concerning
languages of specific language groups. [2]
Apertium project has started as an MT engine for related languages, and
most of the pairs released so far were for translating between
closely-related languages. People working on translators for related
languages of a specific language group tend to flock together, and we
have wiki pages for different language groups, e.g. [3], [4], [5] and
[6], where we try to track the progress. I think it's useful to see your
project as part of a larger effort on building machine translators for a
whole family of related languages. Some synergy should arise there :)
The students who already had subscribed to the apertium-stuff mailing
list will receive this email twice, apologies for that.
That's it for now. I wish you all a great summer of coding! :)
Ilnar (selimcan)
[1]
http://tinodidriksen.com/pisg/freenode/logs/%23apertium/2018-05-12.log
[2] https://sourceforge.net/p/apertium/mailman/
[3] http://wiki.apertium.org/wiki/Turkic_languages
[4] http://wiki.apertium.org/wiki/Uralic_languages
[5] http://wiki.apertium.org/wiki/Germanic_languages
[6] http://wiki.apertium.org/wiki/Slavic_languages
[7] https://www.mediawiki.org/wiki/Content_translation
[8] http://wiki.apertium.org/wiki/Testvoc
[9] http://wiki.apertium.org/wiki/Evaluating_with_Wikipedia
[10] http://wiki.apertium.org/wiki/Irc
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff