> Date: Tue, 4 Nov 2014 19:40:28 +0800
> From: Wei En Ng <[email protected]>
> To: [email protected]
> Reply-To: [email protected]
> Subject: [Apertium-stuff] GCI Task Suggestion: Language Detection
>
> As per the ticket at https://sourceforge.net/p/apertium/tickets/2/
>
> "Apertium APY uses an extremely inefficient and naive method of detecting
> language. An improved approach should utilize an existing framework such as
> CLD to narrow language possibilities to a subset of all languages after
> which custom models are used to determine the language, if necessary."
>
> This could be made into multiple Google Code In tasks.
>

I don't know how langage detection is done for Apertium web interface
but I consider it does not work.

I don't know what you mean by CLD. But never mind.

As we have monodices for a lot of languages, a simple thing would be
to analyse the input texte (or the beginning of it) in the different
languages, and then to count de number of surface form recognised in
each language.

A variant of that can allow to test all the availlable laguages together
in one step.

To do that, we would start doing a lt-expand of each availlable monodix
and then change the right part (lema + attributes) by the name of the
language of this dictionary.
When a surface form has more than one analysis in the current langage,
only keep one result for it (the "uniq" command of UNIX O.S is done
for that).

Then put together the different new monodices for each source language
in only one.

Then we would just have to analyse the source text (or the 50 first
word of it, that may be enough) with this special monodix, and then
just count the occurrences of each language found to choose the best
result.



--------------------------------
Bernard Chardonneau (France)
Phone : [33] 1 64 90 87 04 or [33] 9 72 36 32 90
GSM phone : [33] 6 49 95 13 95

Multilingual websites for my free softwares :
http://libremail.free.fr and http://libremail.tuxfamily.org
http://cyloop.tuxfamily.org (mainly translated with Apertium)

My general website (in french only)
http://bech.free.fr

------------------------------------------------------------------------------
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to