Fran,
[snip]
Hmm, I'm not sure if this is the case... e.g. what would happen if
you
have, e.g.
"^wound/wound<n><sg>/wind<vblex><past>/wound<vblex><pres>/wound<vblex><inf>$
From your corpus (or fractional counts or something)
wound wound<n><sg> 100
wound wind<vblex><past> 20
wound wound<vblex><pres> 50
wound wound<vblex><inf> 3
And your most frequent analysis is wound<n><sg>, but your CG has
removed
it, and
left
"^wound/wind<vblex><past>/wound<vblex><pres>/wound<vblex><inf>$
Would it be good to know that the next most frequent analysis is
wound<vblex><pres> ?
Very good point there! OK, this means you would have a very looooooong
list with all surface forms. I would only keep the most frequent
surface forms (perhaps a couple of thousands would do nicely) and for
less frequent forms, use the "generalized" forms.
MIkel
------------------------------------------------------------------------------
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff