This sounds *very* interesting. I am on mobile now. Will give more detailed
feedback in a longer message later tonight.
Mikel
On 26 d’octubre de 2014 16:07:25 CET, Francis Tyers <[email protected]> wrote:
>Hey all,
>
>I had this idea and would be interested in getting feedback. I think it
>
>could be nicely split into a series of GCI tasks...
>
>Idea: Make a mode for the apertium-tagger that performs lexicalised
>unigram tagging. This would go in the pipeline after a constraint
>grammar, and resolve remaining ambiguity by just selecting the most
>frequent analysis for a given surface form. It could back off to most
>frequent tag string in the event that the training data does not
>contain
>all the surface forms.
>
>The benefit of this over using the existing apertium-tagger would be:
>it
>would a _lot_ easier to train (no need for a .tsx file), no breakage
>when you add new multiwords or contractions, no need to worry about
>tokenisation.
>
>I envisage at least five GCI tasks:
>
>1) Write a prototype in a programming language of your choice (e.g.
>python)
>2) Come up with a data format for storing the model
>3) Write a program to train a model from a tagged corpus (we have this
>now in some way for at least English, Spanish, Catalan, Russian and
>Tatar)
>4) Write a program to run the tagger on a text
>5) Integrate the tagger into the apertium-tagger code (could be done
>like the SWPOST one).
>
>Another thing if it gets done would be to see if it could be trained in
>
>a similar way to the bigram tagger/lexical selection module using TL
>information.
>
>Downside: This would achieve something similar to adding weighted FST
>support to lttoolbox and getting lttoolbox to output analyses by
>weight.
>Something I think would be more desirable.
>
>Any thoughts ?
>
>Fran
>
>------------------------------------------------------------------------------
>_______________________________________________
>Apertium-stuff mailing list
>[email protected]
>https://lists.sourceforge.net/lists/listinfo/apertium-stuff
--
S'ha enviat des del meu telèfon Android amb K-9 Mail. Si us plau, excusa la
meva brevetat.
------------------------------------------------------------------------------
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff