Hi, As Mikel has pointed out you can find some papers related to the use of statistics in Apertium in my web page: http://www.dlsi.ua.es/~fsanchez/
El 09/10/11 09:59, Mikel Forcada escribió: > Luis, > > In addition to the multi-engine MT by Gabriel, there have been many > avenues to hybridization involving Apertium, and Felipe Sánchez-Martínez > has been part of most of them (check his webpage > http://www.dlsi.ua.es/~fsanchez/). I'll answer now, and he can complete > my answer later: > > 1. Felipe, Juan Antonio Pérez-Ortiz and I used nondeterministic output > followed by scoring with a statistical target-language model to > train the part-of-speech tagger of Apertium. However, he managed to > transfer the scores to the part-of-speech tagger so that it would > only deliver one analysis, with similar results. The whole thing is > implemented and is part of Apertium: Felipe will tell you which > packages. The main paper is: > * http://www.springerlink.com/content/m452802q3536044v/fulltext.pdf This is implemented in package apertium-tagger-training-tools > 2. Another thing that Felipe Sánchez-Martínez did was to mix > translation units from a corpus with Apertium output. We published a > paper on this; > * http://www.dlsi.ua.es/~fsanchez/pub/pdf/sanchez-martinez09d.pdf This is implemented in package apertium-chunks-mixer. Please note that this package provides a proof-of-concept implementation. It uses a language model to score translation alternatives, and the way this is done could be improved (in terms of response time) by the use of a lattice. > 3. Finally, Felipe's student Víctor Sánchez Cartagena has been working > hard in hybridization, adding Apertium-generated translation units > to a statistical MT system (the resulting system, Alacant, was one > of the best systems in the WMT 2011 contest for Spanish--English, > see http://www.mt-archive.info/WMT-2011-Callison-Burch.pdf): > * http://www.dlsi.ua.es/~fsanchez/pub/pdf/sanchez-cartagena11c.pdf > * http://www.dlsi.ua.es/~fsanchez/pub/pdf/sanchez-cartagena11b.pdf > * http://www.dlsi.ua.es/~fsanchez/pub/pdf/sanchez-cartagena11a.pdf In addition to these papers on hybridisation, you may also be interested in the inference of structural transfer rules from parallel corpora, if that is the case you might want to read the following paper: Felipe Sánchez-Martínez, Mikel L. Forcada. Inferring shallow-transfer machine translation rules from small parallel corpora. In Journal of Artificial Intelligence Research. volume 34, p. 605-635. http://www.dlsi.ua.es/~fsanchez/pub/pdf/sanchez-martinez09b.pdf Cheers -- Felipe ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
