On Sun, Dec 09, 2007 at 09:21:44PM +0100, Agustin Martin wrote:
> What about
> 
>          #       This generates the wcatalan wordlist.
>          debian/strip_mwl | ispell -d $(CURDIR)/catala.debian -e | \
>                  tr -s ' ' '\n' | sort -u > catala.words.debian
> 
> using sort with the --unique (-u) option.


> You can test with 
> 
> $ sort -u /usr/share/dict/catala > catala.tmp
> 
> sizes: 
> 
> 6519080 catala.tmp
> 7450965 /usr/share/dict/catala
> 
> $ grep -n embalsameu catala.tmp
> 221517:embalsameu
> 
> $ grep -n embalsameu  /usr/share/dict/catala
> 264507:embalsameu
> 264520:embalsameu

Checked with Marc's file. Besides locale dependent sorting, the only
diference is

$ diff -u catala.marc.resorted catala.orig.re-u-sorted
--- catala.marc.resorted    2007-12-10 12:44:21.000000000 +0100
+++ catala.orig.re-u-sorted  2007-12-10 12:43:29.000000000 +0100
@@ -1,3 +1,4 @@
+179620
 1a
 1r
 2a

so 'sort -u' seems to work well.

-- 
Agustin



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to