Am Tue, 17 Jan 2017 19:48:31 +0300 schrieb mansur <[email protected]>:
> Hello, Tommi! > > 1) Unfortunately I couldn't get this script to work, because I am not > so good in Apertium and HFST commands and their syntax :) Ok, there was a bug in the script as well. I tested it now with real thing and some example is here: <http://paste2.org/7YjOayfI>. (For the archives, the script I used here is: $ for lemma in абзый абруй абсолют абый ; do echo $lemma; echo $lemma | sed -e 's/./\0 /g' | sed -e 's/$/ %<n%> ?*/' | hfst-regexp2fst -o temp.hfst; hfst-compose temp.hfst .deps/tat.RL.hfst -o gen.hfst; hfst-fst2strings gen.hfst | cut -f 2; done just in case paste2 will disappear and someone finds this message by internet search or something) > 2) Terabytes of word-forms? Wow, that is quite much :) It is indeed, can sometimes still be used as an argument against the word-form list / database morphology approach. Btw, the above experiment generated 9 megabytes of word-forms from 4 noun lexemes, maybe they aren't what would be generally be wanted for "all word-forms", but it is likely apertium-tat won't be much worse for full lexicon in the end. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D Entwickler. President of ACL SIGUR SIG for Uralic languages <http://gtweb.uit.no/sigur/>. I tend to follow inline-posting style in desktop e-mail messages. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
