Fran,

[snip]

Hmm, I'm not sure if this is the case... e.g. what would happen if you have, e.g.

"^wound/wound<n><sg>/wind<vblex><past>/wound<vblex><pres>/wound<vblex><inf>$

 From your corpus (or fractional counts or something)

wound    wound<n><sg>       100
wound    wind<vblex><past>   20
wound    wound<vblex><pres>  50
wound    wound<vblex><inf>    3

And your most frequent analysis is wound<n><sg>, but your CG has removed it, and
left

"^wound/wind<vblex><past>/wound<vblex><pres>/wound<vblex><inf>$

Would it be good to know that the next most frequent analysis is wound<vblex><pres> ?

Very good point there! OK, this means you would have a very looooooong list with all surface forms. I would only keep the most frequent surface forms (perhaps a couple of thousands would do nicely) and for less frequent forms, use the "generalized" forms.
MIkel
------------------------------------------------------------------------------
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to