On Wednesday 15 December 2004 15:34, Benjamin Jeeves wrote:
> Hi all
Hi Benjamin,
>
> I'm writting a program in perl to md5sum about 500,000 files these files
> are text files and have different files size the biggest being about 500KB.
> my code is below
[..code snipped..]
>
> The thing is that it is taking about 3 to 4 hours to complete would these
> be about right on this number of files?
It's hard to estimate how long it should take - I'm md5sum'ing regularly about
25.000 files, and it takes something like 2 minutes. Given those numbers, I'd
think that it should take around 40 minutes to sum your files.
> This there any way to speed things up? if so any example would be good or a
> point in the right way too? If I do not md5sum the files it print to the
> screen in about 2 mins?
Off the top of my head, I don't know what could be improved in your code -
maybe it would help not to instantiate an own MD5 object in each run of your
for loop?
I'm attaching the code I'm using - maybe you want to run a test how long this
code needs to scan your files.
HTH,
Philipp
# =========================================================================
# SCAN_DIRECTORY
# =========================================================================
# scan all files of a directory and write the result (filenames + MD5)
# into a specified file
# param $directory_name that should be scanned
# param $filename into which to write
# param (optional) $regexp that should be applied to filter out files
my $digest;
my $out_file;
my $directory;
my $filter;
my $base_file;
# callback procedure
sub process {
my $fh = new FileHandle;
return if (! -f $File::Find::name);
if ($filter) {
return if ($File::Find::name =~ /$filter/);
}
# TODO jar support
#if ($File::Find::name =~ /\.jar$/) {
# $base_file = $File::Find::name;
#}
open ($fh, $File::Find::name)
or die "cannot open file $File::Find::name : $!";
binmode($fh);
$digest -> addfile($fh);
my $file_name = substr($File::Find::name, length($directory)+1);
print $out_file $file_name . ";" . $digest -> hexdigest . "\n";
close ($fh);
}
sub scanDirectory {
$digest = Digest::MD5 -> new;
($directory, my $scan_file, $filter) = @_;
my $base_dir = getcwd();
chdir($directory);
$out_file = new FileHandle;
sysopen($out_file, $scan_file, O_CREAT | O_RDWR)
or die "could not open file $scan_file : $!";
find ( \&process, $directory);
close($out_file) or die "could not close file $scan_file : $!";
chdir($base_dir);
}
sub testScanDirectory {
scanDirectory('d:/temp/tools', 'd:/tools.txt');
scanDirectory('d:/temp/tools', 'd:/tools_filtered.txt', '\.exe$');
}
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>