On May 12, Larry Wissink said:
>We have a backup server that is missing records from the production
>server for a particular table. We know that it should have sequential
>records and that it is missing some records. We want to get a sense of
>the number of records missing. So, we know the problem started around
>the beginning of March at id 70,000,000 (rounded for convenience).
>Currently we are at 79,000,000. So, I dumped to a file all the ids
>between 70,000,000 and 79,000,000 (commas inserted here for
>readability). I need to figure out what numbers are missing. The way
>that seemed easiest to me was to create two arrays. One with every
>number between 70 and 79 million, the other with every number in our
>dump file. Then compare them as illustrated in the Perl Cookbook using
>a hash.
>
>But, when I try to scale that to 9 million records, it doesn't work.
>This is probably because it is trying to do something like what db
>people call a cartesian join (every record against every record).
Well, don't do that! ;) When you have a super-set and a sub-set, and
they're ordered, you only need to go through the set ONCE.
@superset = (1 .. 10);
@subset = (1, 2, 4, 7, 8, 9);
@missing = ();
my $idx = 0;
for (@superset) {
push @missing, $_
unless $subset[$idx] == $_ and ++$idx;
}
That's just a bit of shorthand for:
for (@superset) {
if ($subset[$idx] == $_) { $idx++ }
else { push @missing, $_ }
}
Anyway, that populates @missing with the missing elements, in linear time.
--
Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/
RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
CPAN ID: PINYAN [Need a programmer? If you like my work, let me know.]
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>