Various parts of BackupPC spend a lot of time traversing large trees of files, including BackupPC_dump, BackupPC_trashClean and BackupPC_nightly.
As many people have observed, over time BackupPC's pooling results
in directories with files that are widely dispersed across the disk.
This makes disk seeks the performance bottleneck.
Currently BackupPC processes all the files in a directory by
reading the directory and processing each file in the order
returned by the directory read.
Simon Strack from Monash U noted a couple of years ago that disk
seeks can be reduced significantly by sorting the directory read
results by inode. If numeric inode is closely correlated with the
disk position (ie: block) then the files are processed in an order
that reduces disk seeks.
Perl's built-in directory reading functions just return a file name.
The perl module IO::Dirent additionally returns the inode and file
type, which avoids a stat() on each file.
I'm interested in exploring whether IO::Dirent works with different
operating and file systems and, if so, whether traversing those file
systems by sorting inodes returned by IO::Dirent provides any benefit.
I am asking for some volunteers to do the following:
- install IO::Dirent from CPAN.
- unpack the attached tar file in a directory
- make sure IO::Dirent works (ie: returns correct type and inode
information) on the file system you will test by running the
inodeVerify script:
su backuppc
mkdir TOPDIR/temp
cd TOPDIR/temp
inodeVerify
It should print "IO::Dirent is ok".
You can remove the temp directory.
- run the inodeTest benchmark on a large directory tree
(eg: /data/BackupPC/cpool or /data/BackupPC/cpool/0 or
/data/BackupPC/cpool/[0-7]). You need a large enough
tree to render caching unimportant, eg: to do the
entire pool:
su backuppc
inodeTest TOPDIR/cpool
or one of these (1/16 of the pool, 1/4 of the pool or 1/2 of
the pool respectively):
inodeTest TOPDIR/cpool/0
inodeTest TOPDIR/cpool/[0-3]
inodeTest TOPDIR/cpool/[0-7]
The benchmark traverses the tree and stats each file,
first without inode sorting, and then with inode sorting.
The pair of tests is repeated 3 times, and the first pair
is ignored to reduce the measurement error due to caching,
which tends to benefit the second and subsequent runs.
If the run time on the last 4-5 runs is way shorter than
the first then caching is dominating and you need to re-run
with a larger tree.
The ratio of elapsed time taken for the two non-sorted
runs to the two sorted runs is printed.
You should make sure the load from other usage on the file
system is low, or at least relatively constant, during the
test - otherwise the results won't be meaningful.
I'd like to get the following info from you: the output from the
two scripts, the OS, the file system type and raid or lvm setup.
Please email info to me off list and I will summarize.
Craig
inode.tgz
Description: GNU Unix tar archive
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________ BackupPC-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/
