On Fri, Aug 19, 2016 at 2:22 PM Chas. Owens <[email protected]> wrote:
> Truth. If you are checking in lots of things exist a hashset might be a
> better way to go:
>
> my %hashset = map { ($_ => undef) } (3,1,4,2,9,0);
>
> my $found = exists $hashset{4} || 0;
> my $not_found = exists $hashset{10} || 0;
>
> By setting the value of the hash to be undef, you take up less space than
> setting it any other value
>
Here is the result of a benchmark.
cached_hashset: 1
hashset: 1
any: 1
grep: 1
100 items
Rate hashset grep any
cached_hashset
hashset 13953/s -- -93% -95%
-100%
grep 212031/s 1420% -- -25%
-99%
any 283002/s 1928% 33% --
-99%
cached_hashset 33211602/s 237921% 15564% 11635%
--
1000 items
Rate hashset grep any
cached_hashset
hashset 1496/s -- -93% -95%
-100%
grep 21900/s 1364% -- -26%
-100%
any 29658/s 1882% 35% --
-100%
cached_hashset 27036512/s 1807036% 123354% 91061%
--
10000 items
Rate hashset grep any
cached_hashset
hashset 108/s -- -95% -96%
-100%
grep 2197/s 1941% -- -21%
-100%
any 2796/s 2497% 27% --
-100%
cached_hashset 36489558/s 33894645% 1660653% 1304955%
--
>From this, you can see that any is the best choice if you are going to
search the list once, but a hashset is the best choice (by an insane
margin) if you are going to search the list many times. Here is the code
that generated the benchmark:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark;
use List::Util qw/any/;
my @a = (1);
my %cache = (1 => undef);
my %subs = (
grep => sub {
return scalar grep { $_ == 1 } @a;
},
any => sub {
return any { $_ == 1 } @a;
},
hashset => sub {
my %hashset = map { ($_ => undef) } @a;
return exists $hashset{1};
},
cached_hashset => sub {
return exists $cache{1};
}
);
for my $sub (keys %subs) {
print "$sub: ", $subs{$sub}(), "\n";
}
for my $n (100, 1_000, 10_000) {
@a = reverse 1 .. $n;
print "\n$n items\n\n";
Benchmark::cmpthese -2, \%subs;
}