John W. Krahn writes:
> Bryan Harris wrote:
> >
> > >> Sometimes perl isn't quite the right tool for the job...
> > >>
> > >> % man sort
> > >> % man uniq
> > >
> > > If you code it correctly (unlike the program at the URL above) then a
> > > perl version will be more efficient and faster than using sort and uniq.
> >
> > Please explain...
> >
> > That's the last conclusion I thought anyone would be able to reach.
>
> How about a little demo. The times posted are the fastest from ten runs
> of the same programs.
>
> $ perl -le'print int(rand(10_000)+50_000) for 1 .. 1_000_000' >
> random.txt
> $ time sort random.txt | uniq > sorted.shell
>
> real 0m38.799s
> user 0m34.880s
> sys 0m2.920s
> $ time sort -u random.txt > sorted.shell
>
> real 0m23.452s
> user 0m22.520s
> sys 0m0.720s
> $ time perl -lne'$h{$_}=()}{print for sort keys%h' random.txt >
> sorted.perl
>
> real 0m18.450s
> user 0m17.880s
> sys 0m0.450s
> $ diff -s sorted.shell sorted.perl
> Files sorted.shell and sorted.perl are identical
>
>
> The "sort | uniq" version has to run two processes and pass the whole
> file through the pipe from one process to the next. The "sort -u"
> version has to sort the whole file first and then outputs only the
> unique values. The perl version uses a hash to store the unique values
> first and then outputs the sorted values. Depending on the number of
> duplicate values, the perl version will usually be faster as it has to
> sort a smaller list.
But how do they compare when the heash is too big to fit in main
memory? If the has starts swapping, you loose! I do not know,
however, whether using a database based hash would be faster or slower
than the sort -u approach. It would make for an interesting test.
Try sorting a file with over 3E8 unique integers, or maybe just a file
with 256 byte records, and enough unique records to not fit in memory.
> John
> --
> use Perl;
> program
> fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>