Re: [Apertium-stuff] Statistical Apertium

Felipe Sánchez Martínez Sun, 09 Oct 2011 02:17:42 -0700

Hi,

As Mikel has pointed out you can find some papers related to the use of 
statistics in Apertium in my web page: http://www.dlsi.ua.es/~fsanchez/


El 09/10/11 09:59, Mikel Forcada escribió:
> Luis,
>
> In addition to the multi-engine MT by Gabriel, there have been many
> avenues to hybridization involving Apertium, and Felipe Sánchez-Martínez
> has been part of most of them (check his webpage
> http://www.dlsi.ua.es/~fsanchez/). I'll answer now, and he can complete
> my answer later:
>
>  1. Felipe, Juan Antonio Pérez-Ortiz and I used nondeterministic output
>     followed by scoring with a statistical target-language model to
>     train the part-of-speech tagger of Apertium. However, he managed to
>     transfer the scores to the part-of-speech tagger so that it would
>     only deliver one analysis, with similar results. The whole thing is
>     implemented and is part of Apertium: Felipe will tell you which
>     packages. The main paper is:
>       * http://www.springerlink.com/content/m452802q3536044v/fulltext.pdf

This is implemented in package apertium-tagger-training-tools

>  2. Another thing that Felipe Sánchez-Martínez did was to mix
>     translation units from a corpus with Apertium output. We published a
>     paper on this;
>       * http://www.dlsi.ua.es/~fsanchez/pub/pdf/sanchez-martinez09d.pdf

This is implemented in package apertium-chunks-mixer. Please note that 
this package provides a proof-of-concept implementation. It uses a 
language model to score translation alternatives, and the way this is 
done could be improved (in terms of response time) by the use of a lattice.

>  3. Finally, Felipe's student Víctor Sánchez Cartagena has been working
>     hard in hybridization, adding Apertium-generated translation units
>     to a statistical MT system (the resulting system, Alacant, was one
>     of the best systems in the WMT 2011 contest for Spanish--English,
>     see http://www.mt-archive.info/WMT-2011-Callison-Burch.pdf):
>       * http://www.dlsi.ua.es/~fsanchez/pub/pdf/sanchez-cartagena11c.pdf
>       * http://www.dlsi.ua.es/~fsanchez/pub/pdf/sanchez-cartagena11b.pdf
>       * http://www.dlsi.ua.es/~fsanchez/pub/pdf/sanchez-cartagena11a.pdf

In addition to these papers on hybridisation, you may also be interested 
in the inference of structural transfer rules from parallel corpora, if 
that is the case you might want to read the following paper:

Felipe Sánchez-Martínez, Mikel L. Forcada. Inferring shallow-transfer 
machine translation rules from small parallel corpora. In Journal of 
Artificial Intelligence Research. volume 34, p. 605-635.
http://www.dlsi.ua.es/~fsanchez/pub/pdf/sanchez-martinez09b.pdf

Cheers
--
Felipe

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Statistical Apertium

Reply via email to