A couple of languages have been tested for coverage in CLD2. Refer to http://wiki.apertium.org/wiki/Apertium-apy/Language_identification for more details. On 5 Nov, 2014 7:37 am, "Sushain Cherivirala" <[email protected]> wrote:
> On Tue, Nov 4, 2014 at 4:57 PM, Bernard Chardonneau <[email protected]> > wrote: > >> As we have monodices for a lot of languages, a simple thing would be >> to analyse the input texte (or the beginning of it) in the different >> languages, and then to count de number of surface form recognised in >> each language. >> > > As it turns out, "Apertium APY uses an extremely inefficient and naive > method of detecting language." actually refers to the method you describe > almost exactly. > > APY now uses CLD2, the Compact Language Detector > <https://code.google.com/p/cld2/> that relies on a quadgram model. I > believe Wei En is working on building models for more languages starting > with testing > <https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/lang-identify/> > . > > I concur that improving the detection could be broken down into a few GCI > tasks. I'm not sure how exactly how though. > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > >
------------------------------------------------------------------------------
_______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
