A 2014-10-26 19:00, Mikel Forcada escrigué:
> Fran, folks, here's the feedback I promised.
> 
> As I said, this is a great idea, particuarly to round off the work by
> a constraint grammar, and I think the breakdown in GCI tasks could
> work, perhaps excepting the integration in the current tagger.

Cool, perhaps we could split the integration into two or three tasks.

It would basically just be another C++ class that is called via the
apertium-tagger wrapper.

> In a trained corpus we could collect counts in various levels as a
> fallback:
> 
> (1) Complete lexical forms: cantar.vblex.ifi.1.pl
> (2) Lemma-less counts: *.vblex.ifi.1.pl
> (3) Category only: *.vblex.*
> 
> The last two levels can be determined without any need for a
> configuration file.
> 
> So that for an unknown word we can use some more general counts. These
> general counts could be obtained from untagged corpora using naïve
> fractional counting, as was done in SWPOST when no context was taken
> into account.
> 
> Note that for level (1) one does not really need to store counts. One
> can simply store the winning lexical form for each surface form.

Hmm, I'm not sure if this is the case... e.g. what would happen if you 
have, e.g.

"^wound/wound<n><sg>/wind<vblex><past>/wound<vblex><pres>/wound<vblex><inf>$

 From your corpus (or fractional counts or something)

wound    wound<n><sg>       100
wound    wind<vblex><past>   20
wound    wound<vblex><pres>  50
wound    wound<vblex><inf>    3

And your most frequent analysis is wound<n><sg>, but your CG has removed 
it, and
left

"^wound/wind<vblex><past>/wound<vblex><pres>/wound<vblex><inf>$

Would it be good to know that the next most frequent analysis is 
wound<vblex><pres> ?

> Note also that for levels (2) and (3) one does not really need to
> store counts. An ordered list by decreasing number of frequency could
> be enough: the first form found would win.

Yes, for levels 2/3 it could definitely work.

F.

------------------------------------------------------------------------------
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to