I do not mean to be unduly polemic by questioning the methodology in choosing what to compare, neither do I want to overlook the shortfalls of Apertium/ RBMT, however, if Apertium was "good enough" to create corpora for use in ENG-CAT NMT via English-Spanish Europarl corpus and Spanish-Catalan Apertium surely, a "fairer" comparison would have been the English-Spanish pair?
On Sun, Oct 18, 2020 at 9:06 AM Jaume Ortolà i Font <[email protected]> wrote: > > Missatge de Hèctor Alòs i Font <[email protected]> del dia dg., 18 d’oct. > 2020 a les 7:50: >> >> Xavi, I am impressed that you could in Softcatalà get enough bilingual texts >> to create an English-Catalan neural translator. Congratulations on the >> results! I am curious to know how big the corpus you collected has been, as >> well as from which sources to ensure the quality of the translations. > > > The corpora used can be found here: > https://github.com/Softcatala/en-ca-corpus > > One of the corpora is an automatic translation of the English-Spanish > Europarl corpus using Spanish-Catalan Apertium. It has proved good enough to > train the neural translator. > > The neural translator could be improved with better corpora and using more > powerful hardware in the training. The vocabulary size is limited because of > hardware constraints. > >> >> I'd maybe add that probably it would not be possible to collect such a >> corpus for Valencian Catalan, so I guess we face in this neural translator a >> typical problem with lesser-user languages/varieties. If it is ever >> considered necessary to generate Valencian, this will have to be done by >> translating it into "reference" Catalan and then automatically adapting it. >> In fact the same happens for the many flavours we currently have in Apertium >> for Catalan, both Valencian and "Catalonian". > > > It is easy to make a Catalan>Valencian adapter (a few lines of code using > LanguageTool). Not so easy the other way around because some Valencian verbal > forms are ambiguous. > >> >> By the way, is Softcatalà trying to create a neural translator for the >> Spanish-Catalan pair? > > > Not yet. Neural translators require a lot of hardware resources, in training > and in production. We could not support the current volume of Spanish-Catalan > translations with neural translation. > > Jaume Ortolà > > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
