[Ankur-core] [annunce] Anubadok 0.1 beta: First official release

Golam Mortuza Hossain Sun, 11 Jun 2006 22:21:47 -0700

Hi all,

As usual its available from
http://www.imsc.res.in/~golam/anubadok/


Alternately, you may try Google("Anubadok") or even
Google("অনুবাদক") :-)

Here goes the release note...

It started little more than a year from now. Within this
time-frame and given my own constraints and limitations, it is
heartening to announce its first official release.


Few key features of the first official release:


  *     For the first time the number of entries in Anubadok's
        English-to-Bengali dictionary has gone to five-figure.
        This number currently stands at 10,128.


  *     Anubadok now completely supports free "gposttl" tagger
        along with the restricted "treetagger". Further, it
        includes some error correcting codes for some known
        tagging errors of gposttl. This also means that one
        can expect a little higher translation accuracy
        while using "gposttl" than "treetagger".


  *     Anubadok now has an improved proper noun handling
        mechanism. For example, it can now recognise pattern
        like "Bay of Bengal" as a single proper noun and will
        translate as "bangopsagor" instead "banglar
        upsagor". Although it can recognise such pattern
        but for the translation to proceed, it needs to have a
        corresponding entry ("bay.of.bengal") in its E2B
        dictionary. Otherwise, Anubadok will use a fall-back
        mechanism and will translate the same as usual.
        Nevertheless, it will report for the entry through
        "new_words.list".

  *     Documentation has been slightly improved. Though, it
        needs more works.

  *     English sentence splitter for complex sentences:
        This is lacking ever since Anubadok was born but
        recently I have started working on it. The version
        0.1 itself has some code in it. This will be the
        main area of thrust for anubadok-0.2 which is
        available as cvs version.


  *     Anubadok-0.1 does not have any dedicated program for
        Wikipedia translation but one can use "anubadok" or
        "english2bangla" scripts for generic translations.
        However, anubadok-0.2-cvs now includes "wiki_anubadok"
        for translation of wiki text (a script "wikiget" is
        included in the package for fetching wiki articles in
        text format. You just need to give the title of an
        English article.).

-------------------------------------------------------------

Lastly, I thought of having a my own balance sheet of how
much I could and couldn't do in last one year, mainly using
the pre-release versions of Anubadok (its also sort of
beating my own drum :-)). In last few months, I have
translated more than six thousands PO strings in KDE. I am
sure, I would have gone no way near to that without using
it. So to conclude, though Anubadok still has a long way to go
but its current performance with the given amount of codes
in it, is certainly encouraging for the future of Bengali
Machine translation.

Cheers,
Golam
--
http://www.imsc.res.in/~golam/

_______________________________________________
Bengalinux-core mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bengalinux-core

[Ankur-core] [annunce] Anubadok 0.1 beta: First official release

Reply via email to