Don Elliott wrote:
>
> Hi,
>
> I'm having some trouble trying to easily remove lines from a data file using
> a regular expression. I can do it by reading the file in
> a line at a time then deciding whether to chuck it or write it out. My data
> looks something like this -
>
> ENQ:SIMS RE:ELLIOTT,DONALD
> ELLIOTT,DONALD,LAWRENCE
> - DOB 1963SEP30 SEX:M
> 223 OREGAN CR SCORE:27
> BUSINESS: 306-975-8315
> RELATED EVENTS
> GO0024158 1997APR18 COMPLAINANT CRIMINAL ACTIVIT
> <<TAG:REPORT:GO 1997 0024158>>
> GO0006897 1987FEB26 REG OWNER SEIZED VEHICLES
> <<TAG:REPORT:GO 1987 0006897>>
> AC0040436 2002MAY21 REG OWNER FAIL TO ST/REMAI
> <<TAG:REPORT:AC 2002 0040436>>
> AC0000072 1994JAN04 DRIVER IN NON FAT INJ ACC
> <<TAG:REPORT:AC 1994 0000072>>
> ----------------------------------------------------------------------
> <<TAG:REPORT:DATA Complicated multi-line
> tags are possible. This really complicates
> my parsing >>
> MORE MATCHING PERSONS ON FILE
>
> What I need to do is to remove all of the 'tags' from the file
> my best attempt so far has been
>
> $file_with_no_tags =~ s/<<TAG:.+>>//sig;
>
> which removes everything from the first '<<TAG:' to the last '>>'
>
> Is their a better way? (Actually any way that works would be better)
>
> at a different part in my program I need to collect all of the tags.
> this is the code I use for that -
>
> my %tag_hash;
> my @lines = split /\n/,$src;
> my ($in_tag, $long_tag);
> $in_tag = 'FALSE';
> foreach my $line (@lines) {
> if ($line =~ /<<TAG.+>>/ims) { # tag is
> contained in one line
> my ($label,$tagname,$tagval) = split /:/,$line,3;
> chop $tagval; #remove trailing >
> chop $tagval; #remove trailing >
> $tag_hash{$tagname} = $tagval;
> }
> elsif ($line =~ /<<TAG/i) { # start of a
> multi-line tag
> $in_tag = 'TRUE';
> $long_tag = $line;
> }
> elsif ($in_tag eq 'TRUE' and $line =~ />>/i) { # end of a
> multi-line tag
> $in_tag = 'FALSE';
> $long_tag = "$long_tag\n$line";
> my ($label,$tagname,$tagval) = split /:/,$long_tag,3;
> chop $tagval; #remove trailing >
> chop $tagval; #remove trailing >
> $tag_hash{$tagname} = $tagval;
> }
> elsif ($in_tag eq 'TRUE') { #middle of a
> multi-line tag
> $long_tag = "$long_tag\n$line";
> }
> }
>
> This strikes me as being a little long to do something this simple in perl.
>
> Can anyone point me in a better/shorter/more easily understood direction?
If you want shorter then this should do what you want:
my %tag_hash;
for my $tag ( $src =~ /<<TAG:(.+?)>>/isg ) {
my ( $tagname, $tagval ) = split /:/, $tag, 2;
$tag_hash{$tagname} = $tagval;
}
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]