Dear Hèctor,
Thanks for your reply, it's a very interesting comment.
I'm not sure if Belarusian side of Belarusian - Russian Mediawiki corpus (and other Belarusian sides of parallel corpora I've already found) contains Taraškievica or common Belarusian or mixed data. I'll try to check it in case it is explicitly said somewhere.
However, I think it is possible to create a file (and develop a specific format, if needed) describing the main differences between two competiting norms in some formal way and then use this file during searching for extracted postediting operations that improve the quality of translation. We may look through extracted operations and automatically remove those in which differences between mt and pe from o(s, mt, pe) can be described in terms of differences between competiting norms.
But it is a raw idea, I definitely will be thinking about it more.
23.03.2018, 15:15, "Hèctor Alòs i Font" <[email protected]>:
,2018-03-23 13:34 GMT+03:00 Francis Tyers <[email protected]>:El 2018-03-23 10:54, Anna Kondratjeva escribió:Sorry, wrong link, the right one is:
http://wiki.apertium.org/wiki/User:Deltamachine/proposal2018
23.03.2018, 12:51, "Anna Kondratjeva" <[email protected]>:
Hi everyone,
I'm one of those wannabe GSoC students who write here to ask for
feedback.
I have written a draft of my proposal and submitted it on Wiki.
I would be extremely happy, if someone could take a look at it and
give me any advice.
http://wiki.apertium.org/wiki/User:Deltamachine/proposal2018 [1]
It would be cool to give an indication of how much posteditted data
is available for the language pair(s) that you'd like to work on.
Fran
What about Taraškievica / "Classical orthography" in Belarusian? As far as I understand, in the Belarusian Wikipedia both norms are used. How will you deal with them? This can be an interesting case, since in lots of minority languages there are competing norms (and this is, in fact, part of the data "noise" you mention in your proposal). So it would be very useful that, in the tools you may provide for improving other language pairs, it would possible to filter to some extend the election by the human post-editor of a competing norm the Apertium translator is not providing. This way we'll be able to concentrate on actual errors in the morphological analysis, lexical selection or transfer.Hèctor------------------------------------------------------------------------------
,
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
------------------------------------------------------------------------------------------------------
All the best,
Anna
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
