On May 10, Hughes, Andrew said:
>I am using it as a mailing list. However, over the last half year, it has
>gotten pretty big, and has some duplicates in it. For reporting sake, I
>would like to delete the duplicate records based on email addresses. If you
>sign up three times, I only want to keep your first record in there.
The simplest course of action is to use a hash, and use the email address
as the deciding factor of uniqueness.
open ORIG, "< $db" or die "can't read $db: $!";
open NEW, "> $db.new" or die "can't write $db.new: $!";
my %seen;
while (<ORIG>) {
my ($email) = (split /\|/)[3];
print NEW if !$seen{$email}++;
}
close ORIG;
close NEW;
rename "$db.new" => $db or die "can't rename $db.new to $db: $!";
--
Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/
RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
[ I'm looking for programming work. If you like my work, let me know. ]
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]