On Tue, Nov 4, 2014 at 4:57 PM, Bernard Chardonneau <[email protected]> wrote:
> As we have monodices for a lot of languages, a simple thing would be > to analyse the input texte (or the beginning of it) in the different > languages, and then to count de number of surface form recognised in > each language. > As it turns out, "Apertium APY uses an extremely inefficient and naive method of detecting language." actually refers to the method you describe almost exactly. APY now uses CLD2, the Compact Language Detector <https://code.google.com/p/cld2/> that relies on a quadgram model. I believe Wei En is working on building models for more languages starting with testing <https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/lang-identify/> . I concur that improving the detection could be broken down into a few GCI tasks. I'm not sure how exactly how though.
------------------------------------------------------------------------------
_______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
