El dc 13 de 11 de 2013 a les 11:10 +0100, en/na Mikel L. Forcada va
escriure:
> Dear all,
> 
> Aida Sundetova, a 4th-year student from the Kazakh National University 
> in Almaty is visiting the Universitat d'Alacant and we are working 
> together on apertium-eng-kaz (and we plan to do some kaz-eng). We 
> believe this language pair has a strategic value as much of what we do 
> will be reused for other X-Turkic language pairs. In view of that 
> responsibility, we are compelled to do things right.
> 
> As you know, our three-level structural transfer is not computationally 
> more powerful than the original one-level transfer, but it makes it 
> possible to "factor out" some common .t1x operations into higher-level 
> .t2x rules.
> 
> Aida and I are currently arriving to a point where we have to make 
> decisions as to what to put in .t1x and what to put in .t2x. Many of our 
> language pairs have three-level structural transfer, and therefore, 
> their developers have faced the same problems as Aida and I have, but I 
> am not aware of any place where these decisions are explained.
> 
> So, before searching in existing language pairs and trying to understand 
> other people's code (which is more or less like using archaeology to 
> figure out what an ancient culture was), we would appreciate it very 
> much for developers to come forward and give us some clues about how 
> they did their job. Any informal narrative would be helpful.  I know 
> that this may involve some soul-searching (a.k.a. elicitation), so I 
> appreciate your effort even more!
> 
> We plan to write this up as part of Aida's degree defense, and we will 
> duly acknowledge any input there!

The eng-kaz pair is actually using four level transfer, as
apertium-sme-nob, which is at a similar level of complexity.

The header comments in the sme-nob transfer files (just do cat
apertium-sme-nob.sme-nob.t1x | less) give nice comments about which file
does what. But hopefully Unhammer can give us some background/breakdown
too.

My thoughts:

.t1x should be used for local chunking (noun groups and verb groups) --
and for doing local agreement. Examples: "the red bus", "was going to
go" "very quickly".

.t2x should be used to deal with preposition + noun groups and noun +
relative clause groups. You could also deal with some light coordination
here.

.t3x longer distance movement (if chosen), agreement and insertion (e.g.
dropped pronouns)

.t4x inserting missing words (e.g. articles as determined by tags on the
chunk)

One of the things that a transfer-rule developer needs to be on constant
guard for is the LRLM. This can be overcome in some cases by increasing
the number of transfer stages, but it's always a pay off.

It will be very good to have a nicely engineered example that other
pairs can be based on. Not only Turkic pairs, but also pairs like
(Uralic)-English could take advantage of this.

Fran


------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to