Re: [Apertium-stuff] Thoughts on UDPipe, Apertium modules and translation system for Interslavic

Daniel Swanson Thu, 16 Dec 2021 16:18:06 -0800

Greetings Apertiumers!

Figuring out how to incorporate UD parsers into Apertium pipelines is
something that's been on my todo list for a while, but with the
unfortunate property that it keeps getting sidelined by projects that
have deadlines.

With regards to your specific issue, here are the options I can think of:

1. apertium-transfer / chunking
The chunker can pretty much only process adjacent words. You can
encode dependency labels to some extent (e.g.
^green/green<adj><sint><@amod>$), and the rules can refer to those
tags, but I don't think there's any way to access the actual relations
that isn't incredibly hacky and fragile.

2. apertium-recursive
This was created precisely because chunking can't handle long distance
relationships, but to actually use it, you'd end up somehow encoding
and then re-parsing the tree structure which is still fairly fragile
while also probably being an enormous waste of energy.

3. Constraint Grammar
VISL CG-3 can manipulate dependency trees and writing agreement rules
would be fairly straightforward, though you'd have to write them from
scratch rather than copying from existing sources.

4. Bug me to make a real solution
Prototyping a pipeline module to do pretty much exactly what you're
talking about is nominally fairly high on my todo list, and if someone
is actually waiting for it there's a decent amount of hope that I'll
actually start it rather than some other project.

If your main concern is agreement, 3 strikes me as a pretty good
option. On the other hand, if you actually need to modify the tree
structure, 3 might get complicated in which case I'd recommend 4.

Daniel

On Thu, Dec 16, 2021 at 5:20 PM Виктор Булатов <[email protected]> wrote:
>
> Hi everyone. The Interslavic language is a constructed language that is 
> created in such a way that people from Slavic countries are able to 
> understand most of it without any prior education. It has a Wikipedia page 
> and everything (maybe we even will have an ISO-639-3 code "ISV" in the 
> future, fingers crossed!).
>
> I'm looking into developing some sort of MT system for Interslavic (mainly 
> the "Some Natural Slavic Language -> Interslavic" direction). I've managed to 
> cobble a prototype with Russian UDPipe and ISV morphological data/rules 
> before finding out about Apertium (and you guys seem interesting).
>
> The thing is, Russian and Czech are probably the richest Slavic languages in 
> terms of NLP resources. Apertium obviously isn't going to beat a dependency 
> parser that was trained on >1M of labeled sentences. So, I don't really need 
> any of the earlier stages of the Apertium pipeline. However, the chunking and 
> multi-word-expression modules seem promising, especially given that I 
> probably could re-use already existing rules (that are written for different 
> Slavic languages, but it doesn't matter).
>
> So, my question is: is it possible to use the chunking module in isolation? 
> Preferably in a way that allows manipulation of UDPipe's dependency trees? 
> For example, to ensure gender agreement between a noun and attached 
> adjectives.
>
> I would be happy to hear any other advice!
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Thoughts on UDPipe, Apertium modules and translation system for Interslavic

Reply via email to