RE: techniques for handling large text files

Bakken, Luke Mon, 29 Dec 2003 08:27:00 -0800

> How's abouts this (pls excuse any syntax errs) .....
> 
> my $out1 = "$file1.txt";
> my $out2 = "$file2.txt";
> 
> open(INPUT, $filename) or die "error: $filename cannot be opened\n";
> open(OUT1, ">$out1") or die "error: $out1 cannot be opened 
> for writing\n";
> open(OUT2, ">$out2") or die "error: $out2 cannot be opened 
> for writing\n";
> 
> my $regex_split_space='\s+';
> my $regex_marker='^marker';
> my $regex_header='^header';
> 
> while (<INPUT>) {    # read file line by line
>     next if m/$regex_header/;    # skip if header line
>     next if m/$regex_marker/;    # skip if marker line
>     if (/^.*?\s\+1\s\+/) {    # true if '1' is sandwiched by 
> 1st & 2nd '\s+'
>         print OUT1;
>     }
>     else {
>         if (/^.*?\s\+2\s\+/) {    # true if '2' is sandwiched 
> by 1st & 2nd
> '\s+'
>             print OUT2;
>         }
>         else {
>             die "saw something other than a 1 or 2 line\n";
>         }
>     }
> }


The last suggestion I would make here, since you're concerned about raw
speed, is to use substr whenever possible instead of regular
expressions. For instance, instead of using

my $regex_marker = '^marker'; # whitespace around = please!
...
next if m/$regex_header/;

it would most likely be faster to use

my $mark_str = 'marker';
...
next if substr($_, 0, length $mark_str) eq $mark_str;

Note that this will work only when you're looking for a fixed string in
a fixed place.

Luke

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

RE: techniques for handling large text files

Reply via email to