Working on script using File::Find to count the number of news posts
in a semi-extensive hierarchy. As some may know, news posts are
commonly stored in numeric named files, one file per posting.
The following script tries to plow thru a hierarchy returning the
directory name and file count for that directory
There is a litte more involving stripping down longish file names for
printing. But only a certain recognizable pattern, so that the script
could be used against even just 1 directory, with no stripping
happening unless it matches a certain pattern.
Cutting to the chase, it seems to do the job I wanted quite well and
pretty fast too. I chked file counts for specific directories several
times in different iterations of this script... and am satisfied it is
returning accurate results.
I've cut a number of lines of code that involved passing a directory
name and checking if it is a directory name on the file system to
simplify the script.
Now to the question:
The script seems to fail in a certain way when used against a small
hierarchy devised for testing.
However I do not see any wrong output when used against a real news
hierarchy
------- ------- ---=--- ------- -------
script:
[...]
use strict;
use warnings;
use File::Find;
my $startdir = '/home/gnusu/News/agent/nntp';
# my $startdir = './dir1';
my $oacnt = 0;
my $dcnt = 0;
my $mcnt = 0;
my (%data,@out,$ffd,$stripped);
my @tst;
my $gpcnt = 0;
find sub {
return unless /^\d+$/;
$oacnt++;
$mcnt++;
## Only push after the count has been collected
if ($dcnt && $ffd ne $File::Find::dir) {
push @out, sprintf"%-55s %6d", $stripped, $mcnt;
}
## Get every uniq directory name
if ($data{$File::Find::dir}++ == 0) {
$mcnt = 0;
$dcnt++;
## shorten up the path names
$ffd = $File::Find::dir;
if ($ffd =~ /.*News\/agent\/nntp/) {
($stripped) = $ffd =~ m/.*News\/agent\/nntp\/(.*)/;
}else {
$stripped = $ffd;
}
}
}, $startdir;
## only push after the count is done. No more directory names
## means the count cannot be added inside find()
push @out, sprintf"%-55s %6d", $stripped, $mcnt;
## one count seems to end up missing so adding it here
$mcnt += 1;
my $gcnt = 0;
for (sort @out) {
$gcnt++;
printf "%2d: %s\n", $gcnt, $_;
}
print "\n<$oacnt> posts in <$gcnt> directories overall\n";
------- ------- ---=--- ------- -------
using the script on real news hierarchy it seems to return accurate
results . A few lines of output against stored news at:
/home/gnusu/News/agent/nntp
Showing only three lines of otuput from a list of 44 directories.
First, middle and last lines:
1: enews.newsguy.com/alt/solaris/x86 18629
[...]
22: news.gmane.org/gmane/comp/terminal-emulators/tmux/user 8324
[...]
44: nntp.perl.org/perl/perl6/users 3967
The counts are accurate
------- ------- ---=--- ------- -------
Now the problem test directories; 3 stacked directories with 1 numeric
file in each
It looks like:
ls -R ./dir1
./dir1:
111 dir2
./dir1/dir2:
222 dir3
./dir1/dir2/dir3:
333
script output on those three:
(shortened space between dir name and count to prevent mail wrapping)
1: ./dir1 1
2: ./dir1/dir2 1
3: ./dir1/dir2/dir3 0
Notice the last directory shows a count of zero.
Why is that, and how to prevent it?
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/