[Bug 167730] English dictionaries: future maintenance

bugzilla-daemon Tue, 05 Aug 2025 20:39:47 -0700

https://bugs.documentfoundation.org/show_bug.cgi?id=167730


--- Comment #11 from Marco A.G.Pinto <[email protected]> ---
(In reply to László Németh from comment #8)
> (In reply to Marco A.G.Pinto from comment #7)
> 
> Hi Marco,
> 
> AM/AF (Alias Morphology/Alias Flag vector) are only for replacing flag
> vectors and morphological description with an index in the dic file to
> compress the dictionary, see man (5) hunspell, and makealias:
> 
> $ makealias -h
> makealias: make alias compressed dic and aff files
> Usage: makealias [--minimize-diff old_file_without_file_extension] file.dic
> file.aff
> 
> > AM 1834
> > AM ts:0 #1
> > AM st:abatis ts:Ns #2
> 
> In the example above, "1" in the dic file means "ts:0", "2" means "st:abatis
> ts:Ns" etc. It's not possible to reorder AM lines without changing the
> indices in the .dic file, if we don't want to lose the information, which
> word has got the stem "abatis" in the .dic file. Fortunately we don't need
> AM/AF at all.
> 
> The working strategies to get back the lost functionality:
> 
> 1) using my original script attached to the OpenOffice.org issue, which
> extends the dictionaries with morphological description: real stems ("st:")
> and the other affixed forms ("am:" ~allomorphs) (and use the result directly
> or its smaller version compressed with makealias).
> 
> or
> 
> 2) add new word to the original .dic file with alias indices. The new words
> cannot contain flags, so it must create "unmunched" version from the new
> words, listening all of their affixed forms. To create this word list, you
> can use Kevin Hendrick's original "unmunch", or my scipt "wordforms" (part
> of the Hunspell tools).
> 
> hunspell/src/tools$ ./wordforms 
> Usage: wordforms [-s | -p] dictionary.aff dictionary.dic word
> -s: print only suffixed forms
> -p: print only prefixed forms

Nemeth or any other developers,

I have done all the fixes I could suggested by GPT and also installed the two
Hunspell related packages on my VM with Ubuntu 24.04.

I still get errors even reducing the .DIC to just two or three entries for
testing.

parsing line: #  Z --> S
parsed in 13 prefixes and 53 suffixes
.awk: line 1: improper use of next
cat: /home/marco-pinto/Desktop/nemeth/pos/part-of-speech.txt: No such file or
directory
.cat: /home/marco-pinto/Desktop/nemeth/agid/infl.txt: No such file or directory
.awk: line 1: regular expression compile failed (syntax error ^* or ^+)
^*
.cat: /tmp/z.aff: No such file or directory
awk: line 1: improper use of next
.......
Verifying. Different words (if not 0, check /tmp/diff.log): 0
Alias compression...
52 
0/201,204 
0th/205,203 
1/201,202 
1st/205 
1th/203,300 
2/201,204 
2nd/205 
2th/203,300 
3/201,204 
3rd/205 
3th/203,300 
4/201,204 
4th/205,203 
5/201,204 
5th/205,203 
6/201,204 
6th/205,203 
7/201,204 
7th/205,203 
8/201,204 
8th/205,203 
9/201,204 
9th/205,203 
10s/205,203 
20s/205,203 
30s/205,203 
40s/205,203 
50s/205,203 
60s/205,203 
70s/205,203 
80s/205,203 
90s/205,203 
100s/205,203 
200s/205,203 
300s/205,203 
400s/205,203 
500s/205,203 
600s/205,203 
700s/205,203 
800s/205,203 
900s/205,203 
1000s/205,203 
2000s/205,203 
'10s 
'20s 
'30s 
'40s 
'50s 
'60s 
'70s 
'80s 
'90s 
.marco-pinto@marco-pinto-VirtualBox:~/Desktop/nemeth$

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 167730] English dictionaries: future maintenance

Reply via email to