Bahodir Mansurov <[email protected]>
čálii:

> Hello,
>
> I've been trying to develop HFST and TWOL files for the Uzbek language
> by looking at how other similar languages (Tatar, Kazakh, etc.) have
> done it. Those language rules are very complex, at least for someone
> who doesn't know where to start reading. I usually look for a word and
> then go backwords deciphering the rule chain to make sense of it. The
> chain gets so long that I start forgetting the start of the rules. So
> copying and pasting existing solutions and modifying them didn't appeal
> to me. That's why I started adding simple rules first and then
> expanding them for each use case. You can see my progress at [1] and
> [2] (My previous work using the DIX format got so out of hand that I
> gave up developing it.).
>
> As I keep adding or changing more and more rules to fit new usecases, I
> realize that I maybe breaking old usecases. That's why I'd like to
> create test cases first and then change the rules and not be worried
> that I broke any previous work. Are there any such tools that you use?

My favourite method is running a corpus through and diffing:

<corpus.txt apertium -f html-noent fie-bar-dgen > output.1
edit *fie-bar.dix # hack hack hack
make -j
<corpus.txt apertium -f html-noent fie-bar-dgen > output.2
diff -u output.1 output.2 | dwdiff -c --diff-input

This gives a "big picture" view of what actually improves/degrades for
that language pair, and avoids the noise of changes that only affect
rare words/analyses.


You can use the same method for monolingual data, preferably on the
disambiguated output since those are the only analyses that end up
mattering anyway.

Attachment: signature.asc
Description: PGP signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to