Greetings Apertiumers! Figuring out how to incorporate UD parsers into Apertium pipelines is something that's been on my todo list for a while, but with the unfortunate property that it keeps getting sidelined by projects that have deadlines.
With regards to your specific issue, here are the options I can think of: 1. apertium-transfer / chunking The chunker can pretty much only process adjacent words. You can encode dependency labels to some extent (e.g. ^green/green<adj><sint><@amod>$), and the rules can refer to those tags, but I don't think there's any way to access the actual relations that isn't incredibly hacky and fragile. 2. apertium-recursive This was created precisely because chunking can't handle long distance relationships, but to actually use it, you'd end up somehow encoding and then re-parsing the tree structure which is still fairly fragile while also probably being an enormous waste of energy. 3. Constraint Grammar VISL CG-3 can manipulate dependency trees and writing agreement rules would be fairly straightforward, though you'd have to write them from scratch rather than copying from existing sources. 4. Bug me to make a real solution Prototyping a pipeline module to do pretty much exactly what you're talking about is nominally fairly high on my todo list, and if someone is actually waiting for it there's a decent amount of hope that I'll actually start it rather than some other project. If your main concern is agreement, 3 strikes me as a pretty good option. On the other hand, if you actually need to modify the tree structure, 3 might get complicated in which case I'd recommend 4. Daniel On Thu, Dec 16, 2021 at 5:20 PM Виктор Булатов <[email protected]> wrote: > > Hi everyone. The Interslavic language is a constructed language that is > created in such a way that people from Slavic countries are able to > understand most of it without any prior education. It has a Wikipedia page > and everything (maybe we even will have an ISO-639-3 code "ISV" in the > future, fingers crossed!). > > I'm looking into developing some sort of MT system for Interslavic (mainly > the "Some Natural Slavic Language -> Interslavic" direction). I've managed to > cobble a prototype with Russian UDPipe and ISV morphological data/rules > before finding out about Apertium (and you guys seem interesting). > > The thing is, Russian and Czech are probably the richest Slavic languages in > terms of NLP resources. Apertium obviously isn't going to beat a dependency > parser that was trained on >1M of labeled sentences. So, I don't really need > any of the earlier stages of the Apertium pipeline. However, the chunking and > multi-word-expression modules seem promising, especially given that I > probably could re-use already existing rules (that are written for different > Slavic languages, but it doesn't matter). > > So, my question is: is it possible to use the chunking module in isolation? > Preferably in a way that allows manipulation of UDPipe's dependency trees? > For example, to ensure gender agreement between a noun and attached > adjectives. > > I would be happy to hear any other advice! > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
