On Tue, Nov 4, 2014 at 4:57 PM, Bernard Chardonneau <[email protected]>
wrote:

> As we have monodices for a lot of languages, a simple thing would be
> to analyse the input texte (or the beginning of it) in the different
> languages, and then to count de number of surface form recognised in
> each language.
>

As it turns out, "Apertium APY uses an extremely inefficient and naive
method of detecting language." actually refers to the method you describe
almost exactly.

APY now uses CLD2, the Compact Language Detector
<https://code.google.com/p/cld2/> that relies on a quadgram model. I
believe Wei En is working on building models for more languages starting
with testing
<https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/lang-identify/>
.

I concur that improving the detection could be broken down into a few GCI
tasks. I'm not sure how exactly how though.
------------------------------------------------------------------------------
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to