Hi,
I have a script that will find out duplicate rows
in a file...in a file i have 13 millions of
records....
out of that not morethan 5% are duplicate....
for finding duplicate i am using following function...
while (<FH>) {
if (find_duplicates ()) {
$dup++
}
}
# return 1, if record is duplicate
#returns 0, if record is not duplicate
sub find_duplicates ()
{
$key = substr($_,10,10);
if ( exists $keys{$key} ) {
$keys{$key}++;
return 1; #duplicate row
} else {
$keys{$key}++;
return 0; #not a duplicate
}
}
---------------------------------------------
here i am storing 13 millions into hash...
I think that is why i am getting out of memory.....
how to avoid this ?
Thanx
-Madhu
__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]