On Mon, 21 May 2007, Matthew Woehlke wrote:
I thought about that, but /maximum/ efficiency is only achievable
doing everything in one go. Anyway I think 'countitems' would still be
a big improvement; I would do that as 'sort --unique-with-count'
(preferably aliased 'sort -U') since IMO this is a missing feature of
'sort -u'.
You don't really want to do the first sort at all - it's just a
convenient way of creating the buckets. The relative order of each
bucket is unimportant, but that's what sort spends a long time
calculating.
A fundamentally more efficient approach would be something like:
perl -lne '$bucket{$_}++; END { foreach $key (keys %bucket) { print "$bucket{$key}
$key" } }' | \
sort -n
The trailing "sort" could be done inside perl, but it doesn't help the
(algorithmic) efficiency, and we're not playing perl golf...
Cheers,
Phil
_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils