I do not mean to be unduly polemic by questioning the methodology in
choosing what to compare, neither do I want to overlook the shortfalls
of Apertium/ RBMT, however, if Apertium was "good enough" to create
corpora for use in ENG-CAT NMT via English-Spanish Europarl corpus and
Spanish-Catalan Apertium surely, a "fairer" comparison would have been
the English-Spanish pair?

On Sun, Oct 18, 2020 at 9:06 AM Jaume Ortolà i Font
<[email protected]> wrote:
>
> Missatge de Hèctor Alòs i Font <[email protected]> del dia dg., 18 d’oct. 
> 2020 a les 7:50:
>>
>> Xavi, I am impressed that you could in Softcatalà get enough bilingual texts 
>> to create an English-Catalan neural translator. Congratulations on the 
>> results! I am curious to know how big the corpus you collected has been, as 
>> well as from which sources to ensure the quality of the translations.
>
>
> The corpora used can be found here:
> https://github.com/Softcatala/en-ca-corpus
>
> One of the corpora is an automatic translation of the English-Spanish 
> Europarl corpus using Spanish-Catalan Apertium. It has proved good enough to 
> train the neural translator.
>
> The neural translator could be improved with better corpora and using more 
> powerful hardware in the training. The vocabulary size is limited because of 
> hardware constraints.
>
>>
>> I'd maybe add that probably it would not be possible to collect such a 
>> corpus for Valencian Catalan, so I guess we face in this neural translator a 
>> typical problem with lesser-user languages/varieties. If it is ever 
>> considered necessary to generate Valencian, this will have to be done by 
>> translating it into "reference" Catalan and then automatically adapting it. 
>> In fact the same happens for the many flavours we currently have in Apertium 
>> for Catalan, both Valencian and "Catalonian".
>
>
> It is easy to make a Catalan>Valencian adapter (a few lines of code using 
> LanguageTool). Not so easy the other way around because some Valencian verbal 
> forms are ambiguous.
>
>>
>> By the way, is Softcatalà trying to create a neural translator for the 
>> Spanish-Catalan pair?
>
>
> Not yet. Neural translators require a lot of hardware resources, in training 
> and in production. We could not support the current volume of Spanish-Catalan 
> translations with neural translation.
>
> Jaume Ortolà
>
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to