On Mon, Jun 23, 2003 at 10:43:07AM +0200 Denham Eva wrote:
> I am very much a novice at perl and probably bitten off more than I can chew
> here.
> I have a file, which is a dump of a database - so it is a fixed file format.
> The problem is that I am struggling to manipulate it correctly. I have been
> trying for two days now to get a program to work. The idea is to remove the
> duplicate records, ie a record begins with Name and ends with Values End.
> The program that I have thus far, is pathetic in the sense I have opened
> three files, the file below, a data file for cleaned data, and a file for
> capturing the usernames already processed. But I have got stuck on how to
> compare and work through the file line for line and then only to capture the
> lines that are not duplicated.
Keeping a couple of files around is not necessarily pathetic. I think
you don't need a file for the processed usernames. But the original file
and one for the processed data is a totally common pattern.
> Here is the file format....
>
> <File Begins>
> #DB dumped
> #DB version 8.0
> #SW version 2.6(1.10)
> #---------------------------------------------------------------------------
> --
> Name : system
> Some stuff here...
> many lines....
> Of different format...
> such as line below...
> User Count : 0
> ##--- User End
> Lots of text here...
> Until...
> We get line below...
> ##--- Values End
> #---------------------------------------------------------------------------
> --
So, "#-----..." is essentially the record separator? A fixed separator
is good because it makes processing rather easy. It might be handy to
both set the input record separator to this value:
#! /usr/bin/perl -w
use strict;
local $/ =
"#-----------------------------------------------------------------------------\n";
open IN, "old_database" or die $!;
open OUT, ">new_database" or die $!;
# keep track of what records have already been seen
my %records_seen;
# this is the 'header', that is: what is before the first record
print OUT scalar <IN>;
while (<IN>) {
if (/Name\s+:\s+(\S+)/) {
# ^^^
# $1 is record name
next if $records_seen{ $1 }++;
print OUT $_;
}
}
print OUT "#End Of Dump\n";
close IN;
close OUT;
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]