A couple of languages have been tested for coverage in CLD2. Refer to
http://wiki.apertium.org/wiki/Apertium-apy/Language_identification for more
details.
On 5 Nov, 2014 7:37 am, "Sushain Cherivirala" <[email protected]> wrote:

> On Tue, Nov 4, 2014 at 4:57 PM, Bernard Chardonneau <[email protected]>
> wrote:
>
>> As we have monodices for a lot of languages, a simple thing would be
>> to analyse the input texte (or the beginning of it) in the different
>> languages, and then to count de number of surface form recognised in
>> each language.
>>
>
> As it turns out, "Apertium APY uses an extremely inefficient and naive
> method of detecting language." actually refers to the method you describe
> almost exactly.
>
> APY now uses CLD2, the Compact Language Detector
> <https://code.google.com/p/cld2/> that relies on a quadgram model. I
> believe Wei En is working on building models for more languages starting
> with testing
> <https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/lang-identify/>
> .
>
> I concur that improving the detection could be broken down into a few GCI
> tasks. I'm not sure how exactly how though.
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
------------------------------------------------------------------------------
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to