On Sat, Mar 25, 2006 at 05:17:38PM +0100, Jordi Mallach wrote: > Hi, > > I've spent some hours thinking about how to solve: > #345242: aspell-ca: reports hyphenated and apostrophed words as > mispellings. > > It's not trivial. Many verions ago, I asked Agustín about ideas to solve > the big size of my generated dictionaries. He suggested that I could > remove a few rules from my .aff file, and that indeed did generate a > reasonably-sized dictionary. Unfortunately, the stuff that was removed > from the resulting dictionary is quite annoying. > > I tried adding some of the rules again, but the dictioary still grows > quite a bit. I've been discussing with my upstream dictionary > maintainer, and he suggests I remove some rules from the aff file and > then hack around the generated wordlist to make things work, although > they suck a bit. > > The "100% correct" aspell dictionary is nearly 200 megabytes, as it > includes a lot of variations for hyphenated and apostrophed words, which > is mainly what was getting removed in the past.
Hi, Jordi I think you need to use affix compression, but previously upstream (or you) need to fix myspell affix file so aspell accepts it. Some things there are not accepted by aspell, see my experiments about using affix compression in aspell-ca http://bugs.debian.org/311391 Since I filed that bugreport against the source package ispellcat instead of aspell-ca, it went probably missed, since I never received any reply. I cite the good news there, ------------------------------------------------------------------------- EPILOG) And the good news ;-) ============================= And now, after the lo...oong report, the good news, building the catalan dict unstripped but with affix compression produces a 3.7Mb hash file, instead of the >100Mb file that was previously needed for the unstripped version. ------------------------------------------------------------------------- If upstream is too busy now to deal with this I suggest you to use aspell affix file in aspell6-ca (in the aspell site), that has been fine tuned by Kevin Atkinson to work with affix compression. I hope it will also work well for myspell, but in case not, keep two myspell type affix files, one for myspell and other for aspell, as a temporary fix. I am attaching a patch with what I think should work for using affix compression, once myspell affix file is fixed (another couple of problems are also fixed in the patch). Note that it will not work for aspell with current affix file. Salut, -- Agustin
diff -u ispellcat-0.4/debian/ca.dat ispellcat-0.4/debian/ca.dat --- ispellcat-0.4/debian/ca.dat +++ ispellcat-0.4/debian/ca.dat @@ -4,2 +4,5 @@ -special ' -*- · -*- - -*- +special ' -*- · -* - -*- . --* soundslike generic +affix ca +affix-compress true +repl-table ca_affix.dat diff -u ispellcat-0.4/debian/changelog ispellcat-0.4/debian/changelog --- ispellcat-0.4/debian/changelog +++ ispellcat-0.4/debian/changelog @@ -1,3 +1,11 @@ +ispellcat (0.4-6.1) unstable; urgency=low + + * debian/ca.dat, debian/rules: Use affix compression + * debian/ca.dat: Allow . at end of words + * debian/rules: Make sure no cruft is left on purge + + -- Agustin Martin Domingo <[EMAIL PROTECTED]> Sun, 26 Mar 2006 23:53:04 +0200 + ispellcat (0.4-6) unstable; urgency=low * debian/control: diff -u ispellcat-0.4/debian/rules ispellcat-0.4/debian/rules --- ispellcat-0.4/debian/rules +++ ispellcat-0.4/debian/rules @@ -32,9 +32,7 @@ # cat catala.words.debian | \ # aspell --local-data-dir=$(CURDIR) --lang=ca \ # create master ./ca.rws - cp catala.words.debian ca.wl - prezip ca.wl - gzip ca.cwl + cat catalan-m.dic | fromdos | sed '1d' | prezip | gzip -c > ca.cwl.gz echo "add ca.rws" > ca.multi echo "add ca.multi" > catalan.alias @@ -71,14 +69,14 @@ # aspell-ca stuff install -m 644 ca.cwl.gz $(ADICT_DIR)/usr/share/aspell install -m 644 debian/ca.dat $(ADICT_DIR)/usr/lib/aspell/ca.dat - touch $(ADICT_DIR)/usr/lib/aspell/ca.rws - touch $(ADICT_DIR)/usr/lib/aspell/ca.compat + install -m 644 catalan-m.aff $(ADICT_DIR)/usr/lib/aspell/ca_affix.dat install -m 644 ca.multi $(ADICT_DIR)/usr/lib/aspell/ca.multi install -m 644 catalan.alias $(ADICT_DIR)/usr/lib/aspell/catalan.alias install -m 644 catala.alias $(ADICT_DIR)/usr/lib/aspell/catala.alias install -m 644 català .alias $(ADICT_DIR)/usr/lib/aspell/català .alias + touch $(ADICT_DIR)/var/lib/aspell/ca.rws touch $(ADICT_DIR)/var/lib/aspell/ca.compat - + # install -m 644 ca_phonet.dat $(ADICT_DIR)/usr/lib/aspell/ca_phonet.dat # myspell-ca stuff