Hey guys, I'm writing a system demonstration to be submitted at LowResMT 2020 about the recent project that was done as part of GSoC, titled "Translating the internet into low resource languages with Apertium" (Accepting snazzier title suggestions).
As part of this demonstration, I want to show some real world examples of how the new system of markup handling will help the translation of webpages and formatted documents - odt, pptx, rtx, etc. To show this effectively, I need to choose 3-4 released language pairs that are sufficiently syntactically divergent that they show the effect of markup reordering in the translation output. As far as I know, spa-cat is one of our most mature pairs, however I'm not sure how syntactically divergent it is. If it is, then I'm happy to be corrected. If your language pair has had issues with webpage translation and those issues are now solved (ish), then some examples would be really helpful. TLDR: I need suggestions of language pairs which are mature, low resource (at least the target language), and which are syntactically divergent enough to see the benefits of markup handling in the translation. If you can provide examples, that'll be great as well. Any help will be sincerely appreciated :)) Thanks and Regards, *तन्मय खन्ना * *Tanmai Khanna*
_______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
