> Date: Tue, 4 Nov 2014 19:40:28 +0800 > From: Wei En Ng <[email protected]> > To: [email protected] > Reply-To: [email protected] > Subject: [Apertium-stuff] GCI Task Suggestion: Language Detection > > As per the ticket at https://sourceforge.net/p/apertium/tickets/2/ > > "Apertium APY uses an extremely inefficient and naive method of detecting > language. An improved approach should utilize an existing framework such as > CLD to narrow language possibilities to a subset of all languages after > which custom models are used to determine the language, if necessary." > > This could be made into multiple Google Code In tasks. >
I don't know how langage detection is done for Apertium web interface but I consider it does not work. I don't know what you mean by CLD. But never mind. As we have monodices for a lot of languages, a simple thing would be to analyse the input texte (or the beginning of it) in the different languages, and then to count de number of surface form recognised in each language. A variant of that can allow to test all the availlable laguages together in one step. To do that, we would start doing a lt-expand of each availlable monodix and then change the right part (lema + attributes) by the name of the language of this dictionary. When a surface form has more than one analysis in the current langage, only keep one result for it (the "uniq" command of UNIX O.S is done for that). Then put together the different new monodices for each source language in only one. Then we would just have to analyse the source text (or the 50 first word of it, that may be enough) with this special monodix, and then just count the occurrences of each language found to choose the best result. -------------------------------- Bernard Chardonneau (France) Phone : [33] 1 64 90 87 04 or [33] 9 72 36 32 90 GSM phone : [33] 6 49 95 13 95 Multilingual websites for my free softwares : http://libremail.free.fr and http://libremail.tuxfamily.org http://cyloop.tuxfamily.org (mainly translated with Apertium) My general website (in french only) http://bech.free.fr ------------------------------------------------------------------------------ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
