On Thu, Jun 10, 2010 at 10:26, Rob Dixon <[email protected]> wrote:
snip
> It is possible that index() is faster than regular expressions, but I would
> write the code below.
snip
It looks like, at least on Perl 5.12.0, iterated_regex is faster than
index for finding non-overlaping substrings when the substring is
small, but index starts to win as the substring grows in size. The
real winner is split though. It is basically the same algorithm as
iterated_regex, but the loop is implemented in C instead of Perl.
I would probably still go with an inline pure_regex type
implementation. I don't see the value of making a function to do this
unless profiling shows that I have a bottleneck in that section of the
code and need the extra performance of something that can't be easily
inlined.
Perl version 5.012000
index => 2
iterated_regex => 2
pure_regex => 2
split => 2
Rate pure_regex index iterated_regex split
pure_regex 608548/s -- -6% -46% -74%
index 649176/s 7% -- -42% -73%
iterated_regex 1124391/s 85% 73% -- -52%
split 2360644/s 288% 264% 110% --
Rate pure_regex index iterated_regex split
pure_regex 613304/s -- -6% -44% -75%
index 649177/s 6% -- -41% -73%
iterated_regex 1101602/s 80% 70% -- -54%
split 2406041/s 292% 271% 118% --
Rate pure_regex index iterated_regex split
pure_regex 590160/s -- -9% -46% -75%
index 651987/s 10% -- -40% -73%
iterated_regex 1092266/s 85% 68% -- -55%
split 2406041/s 308% 269% 120% --
Rate pure_regex index iterated_regex split
pure_regex 601510/s -- -7% -47% -75%
index 649176/s 8% -- -42% -73%
iterated_regex 1124391/s 87% 73% -- -54%
split 2429401/s 304% 274% 116% --
Rate pure_regex index iterated_regex split
pure_regex 590160/s -- -8% -48% -76%
index 643109/s 9% -- -43% -74%
iterated_regex 1137401/s 93% 77% -- -54%
split 2477508/s 320% 285% 118% --
substring of 10 sets
Rate pure_regex iterated_regex index split
pure_regex 128/s -- -47% -56% -91%
iterated_regex 241/s 88% -- -17% -84%
index 290/s 126% 20% -- -80%
split 1478/s 1050% 513% 409% --
substring of 100 sets
Rate iterated_regex pure_regex index split
iterated_regex 522/s -- -35% -64% -74%
pure_regex 807/s 55% -- -44% -60%
index 1437/s 175% 78% -- -28%
split 1999/s 283% 148% 39% --
substring of 1000 sets
Rate iterated_regex pure_regex split index
iterated_regex 599/s -- -62% -71% -75%
pure_regex 1569/s 162% -- -24% -36%
split 2055/s 243% 31% -- -16%
index 2443/s 308% 56% 19% --
substring of 10000 sets
Rate iterated_regex pure_regex split index
iterated_regex 570/s -- -56% -68% -77%
pure_regex 1309/s 130% -- -26% -48%
split 1761/s 209% 34% -- -30%
index 2510/s 340% 92% 43% --
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark;
print "Perl version $]\n";
my $string = "ababababab";
my $substring = "abab";
my %subs = (
pure_regex => sub {
return scalar( ()= $string =~ /\Q$substring/g );
},
iterated_regex => sub {
my $n;
$n++ while $string =~ /\Q$substring/g;
return $n;
},
index => sub {
my $offset = 0;
my $result = index $string, $substring, $offset;
my $length = length $substring;
my $n;
while ($result != -1) {
$n++;
$offset = $result + $length;
$result = index $string, $substring, $offset;
}
return $n;
},
split => sub {
my $c = split /\Q$substring/, $string;
#handle the empty string case
return $c ? $c - 1 : 0;
}
);
for my $sub (sort keys %subs) {
print "$sub => ", $subs{$sub}->(), "\n";
}
for $string ("abab", "abab" x 10, ("x" x 1000) . "abab", "x" x 10_000,
"ab" x 10_000 ) {
Benchmark::cmpthese -1, \%subs;
}
$string = "abab" x 100_000;
for my $n (10, 100, 1_000, 10_000) {
print "\nsubstring of $n sets\n";
$substring = "abab" x $n;
Benchmark::cmpthese -1, \%subs;
}
--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/