This sounds *very* interesting. I am on mobile now. Will give more detailed 
feedback in a longer message later tonight. 
Mikel


On 26 d’octubre de 2014 16:07:25 CET, Francis Tyers <[email protected]> wrote:
>Hey all,
>
>I had this idea and would be interested in getting feedback. I think it
>
>could be nicely split into a series of GCI tasks...
>
>Idea: Make a mode for the apertium-tagger that performs lexicalised 
>unigram tagging. This would go in the pipeline after a constraint 
>grammar, and resolve remaining ambiguity by just selecting the most 
>frequent analysis for a given surface form. It could back off to most 
>frequent tag string in the event that the training data does not
>contain 
>all the surface forms.
>
>The benefit of this over using the existing apertium-tagger would be:
>it 
>would a _lot_ easier to train (no need for a .tsx file), no breakage 
>when you add new multiwords or contractions, no need to worry about 
>tokenisation.
>
>I envisage at least five GCI tasks:
>
>1) Write a prototype in a programming language of your choice (e.g. 
>python)
>2) Come up with a data format for storing the model
>3) Write a program to train a model from a tagged corpus (we have this 
>now in some way for at least English, Spanish, Catalan, Russian and 
>Tatar)
>4) Write a program to run the tagger on a text
>5) Integrate the tagger into the apertium-tagger code (could be done 
>like the SWPOST one).
>
>Another thing if it gets done would be to see if it could be trained in
>
>a similar way to the bigram tagger/lexical selection module using TL 
>information.
>
>Downside: This would achieve something similar to adding weighted FST 
>support to lttoolbox and getting lttoolbox to output analyses by
>weight. 
>Something I think would be more desirable.
>
>Any thoughts ?
>
>Fran
>
>------------------------------------------------------------------------------
>_______________________________________________
>Apertium-stuff mailing list
>[email protected]
>https://lists.sourceforge.net/lists/listinfo/apertium-stuff

-- 
S'ha enviat des del meu telèfon Android amb K-9 Mail. Si us plau, excusa la 
meva brevetat.
------------------------------------------------------------------------------
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to