Re: Greedy Regular Expression

John W. Krahn Fri, 26 Jul 2002 16:17:02 -0700

Don Elliott wrote:
> 
> Hi,
> 
> I'm having some trouble trying to easily remove lines from a data file using
> a regular expression. I can do it by reading the file in
> a line at a time then deciding whether to chuck it or write it out. My data
> looks something like this -
> 
> ENQ:SIMS RE:ELLIOTT,DONALD
> ELLIOTT,DONALD,LAWRENCE
>  - DOB 1963SEP30 SEX:M
>    223 OREGAN CR             SCORE:27
>   BUSINESS: 306-975-8315
>    RELATED EVENTS
> GO0024158 1997APR18 COMPLAINANT CRIMINAL ACTIVIT
> <<TAG:REPORT:GO 1997 0024158>>
> GO0006897 1987FEB26 REG OWNER   SEIZED VEHICLES
> <<TAG:REPORT:GO 1987 0006897>>
> AC0040436 2002MAY21 REG OWNER   FAIL TO ST/REMAI
> <<TAG:REPORT:AC 2002 0040436>>
> AC0000072 1994JAN04 DRIVER IN   NON FAT INJ ACC
> <<TAG:REPORT:AC 1994 0000072>>
> ----------------------------------------------------------------------
> <<TAG:REPORT:DATA Complicated multi-line
> tags are possible. This really complicates
> my parsing >>
> MORE MATCHING PERSONS ON FILE
> 
> What I need to do is to remove all of the 'tags' from the file
> my best attempt so far has been
> 
> $file_with_no_tags =~ s/<<TAG:.+>>//sig;
> 
> which removes everything from the first '<<TAG:' to the last '>>'
> 
> Is their a better way? (Actually any way that works would be better)
> 
> at a different part in my program I need to collect all of the tags.
> this is the code I use for that -
> 
>     my %tag_hash;
>     my @lines = split /\n/,$src;
>     my ($in_tag, $long_tag);
>     $in_tag = 'FALSE';
>     foreach my $line (@lines) {
>         if ($line =~ /<<TAG.+>>/ims) {                       # tag is
> contained in one line
>             my ($label,$tagname,$tagval) = split /:/,$line,3;
>             chop $tagval;    #remove trailing >
>             chop $tagval;    #remove trailing >
>             $tag_hash{$tagname} = $tagval;
>         }
>         elsif ($line =~ /<<TAG/i) {                            # start of  a
> multi-line tag
>             $in_tag = 'TRUE';
>             $long_tag = $line;
>         }
>         elsif ($in_tag eq 'TRUE' and $line =~ />>/i) {   # end of a
> multi-line tag
>             $in_tag = 'FALSE';
>             $long_tag = "$long_tag\n$line";
>             my ($label,$tagname,$tagval) = split /:/,$long_tag,3;
>             chop $tagval;    #remove trailing >
>             chop $tagval;    #remove trailing >
>             $tag_hash{$tagname} = $tagval;
>         }
>         elsif ($in_tag eq 'TRUE') {                            #middle of a
> multi-line tag
>             $long_tag = "$long_tag\n$line";
>         }
>     }
> 
> This strikes me as being a little long to do something this simple in perl.
> 
> Can anyone point me in a better/shorter/more easily understood direction?



If you want shorter then this should do what you want:

my %tag_hash;
for my $tag ( $src =~ /<<TAG:(.+?)>>/isg ) {
    my ( $tagname, $tagval ) = split /:/, $tag, 2;
    $tag_hash{$tagname} = $tagval;
    }



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Greedy Regular Expression

Reply via email to