Re: Perl Script runs to slow

John W. Krahn Mon, 30 Jun 2008 17:25:37 -0700

Cheez wrote:

Howdy,


Hello,

scripting with perl is a hobby and not a vocation so i
apologize in advance for rough looking code.

I have a very large list of 16-letter words called
"hashsequence16.txt".  This file is 203MB in size.

I have a large list of data called "newrawdata.txt".  This file is
95MB.

For each 16-letter word, I am looping through "newrawdata.txt" to 1)
find a match and 2) take the the full line of rawdata.txt and
associate that with the 16-letter word.

Using a filesize line-counter and timing how long it takes to process
my data lets me know that I have 9534 hours to see if I can find an
alternative solution.  It's pretty brute force but I don't know if
there is another way to do it.

Any comments or guidance would be greatly appreciated.

Thanks,
Dan
==========================================


use warnings;
use strict;

print "**fisher**";

$flatfile = "newrawdata.txt";


my $flatfile = 'newrawdata.txt';

# 95MB in size

$datafile = "hashsequence16.txt";


my $datafile = 'hashsequence16.txt';

# 203MB in size


my $seqparsed = 'fishersearch.txt';

my $filesize = -s "hashsequence16.txt";

You already have the string "hashsequence16.txt" stored in the variable$datafile so why not use that instead:


my $filesize = -s $datafile;

# for use in processing time calculation

open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";
open(FILE2, "$datafile") || die "Can't open '$flatfile': $!\n";


perldoc -q "What.s wrong with always quoting ..vars."

Your error message for $datafile says it couldn't open $flatfile!

open (SEQFILE, ">fishersearch.txt") || die "Can't open '$seqparsed': $!
\n";

Modern Perl idiom is to use a lexical filehandle, three argument openand the lower precedence 'or' operator:


open my $FILE,    '<', $flatfile  or die "Can't open '$flatfile': $!\n";
open my $FILE2,   '<', $datafile  or die "Can't open '$datafile': $!\n";
open my $SEQFILE, '>', $seqparsed or die "Can't open '$seqparsed': $!\n";

@preparse = <FILE>;


Since you are going to be removing the newlines anyway:

chomp( my @preparse = <FILE> );

@hashdata = <FILE2>;


It looks like you don't really need to store this whole file in memory.

close(FILE);
close(FILE2);


for my $list1 (@hashdata) {


You could probably just read through this file normally:

while ( my $list1 = <FILE2> ) {

# iterating through hash16 data

    $finish++;


And if you use a while loop you can use $. to get the current line number.

    if ($finish ==10 ) {
# line counter

        $marker = $marker + $finish;

        $finish =0;

        $left = $filesize - $marker;

$marker is based on the line number and $filesize is based on the numberof bytes in the file so this calculation makes no sense. Perhaps youwant this instead:


# outside the loop declare $left
my $left = $filesize;

        # then here in the loop

        $left -= length $list1;

        printf "$left\/$filesize\n";

printf() treats its first argument as a format string so that should beeither:


        printf "%s/%s\n", $left, $filesize;

Or just:

        print "$left/$filesize\n";

# this prints every 17 seconds
                        }

    ($line, $freq) = split(/\t/, $list1);


You never use $freq anywhere so just:

      my ( $line ) = split /\t/, $list1;

    for my $rawdata (@preparse) {
# iterating through rawdata

        $rawdata=~ s/\n//;

        if ($rawdata =~ m/$line/) {
# matching hash16 word with rawdata line

            my $first_pos = index  $rawdata,$line;


You could combine the last two statements:

        if ( ( my $first_pos = index $rawdata, $line ) >= 0 ) {

            print SEQFILE "$first_pos\t$rawdata\n";
# printing to info to new file

                                }
                        }

    print SEQFILE "PROCESS\t$line\n";
# printing hash16 word and "process"

}



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Perl Script runs to slow

Reply via email to