David Storrs <[EMAIL PROTECTED]> writes:
<snip>
> Here are two versions that do work:
>
> #!/usr/bin/perl
>
> use Benchmark;
>
> my @array = qw( bar baz foo);
>
> timethese(1_000_000,
> {
> 'short_circuit' => \&short_circuit,
> 'keys_compare' => \&keys_compare,
> }
> );
>
> sub short_circuit {
> # return 1 if dupe, 0 if unique
>
> my %seen;
> my $is_dupe = 0;
> for(@array) {
> $seen{$_}++;
> if ($seen{$_} > 1) {
> $is_dupe++;
> last;
> }
> }
> return $is_dupe;
> }
>
> sub keys_compare {
> # return 1 if dupe, 0 if unique
>
> my %seen;
> $seen{$_} = 1 for(@array) ;
>
> if (scalar keys %seen != scalar @array) {
> return 1;
> } else {
> return 0;
> }
> }
> __END__
>
> Out of curiousity, I benchmarked them a couple times. Here are the
> results (note that I'm only testing with 4 elements; the timings could
> change greatly with large datasets):
>
> The executive summary is that short_circuit has a significantly better
> best case, a worse worst case, and considerably more variability than
> keys_compare. So, you're probably better off using keys_compare.
keys_compare might be slightly better for small sets, but for large
sets, short_circuit is the way to go.
I initialized my @array like so:
my @array;
for (1 .. 5000) {
push @array, int((rand) * 10000);
}
And I only ran 10,000 iterations, since I wanted it to finish before I
got bored with the whole thing.
Results:
Benchmark: timing 10000 iterations of grep_compare, keys_compare, short_circuit...
grep_compare: 140 wallclock secs (136.80 usr + 0.37 sys = 137.17 CPU) @ 72.90/s
(n=10000)
keys_compare: 127 wallclock secs (126.35 usr + 0.21 sys = 126.56 CPU) @ 79.01/s
(n=10000)
short_circuit: 5 wallclock secs ( 4.67 usr + 0.01 sys = 4.68 CPU) @ 2136.75/s
(n=10000)
As you can see, grep_compare performs somewhat worse than
keys_compare, and short_circuit cleans up.
Of course, short_circuit won't perform quite as well as the other two
when dealing with arrays that are usually unique, but as far as I can
tell, short_circuit's worst case should be about the same as the
normal case for the other two algorithms.
-RN
--
Robin Norwood
Red Hat, Inc.
"The Sage does nothing, yet nothing remains undone."
-Lao Tzu, Te Tao Ching
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]