[BackupPC-users] Does ordering file access by inode improve performance?

Craig Barratt Mon, 09 Apr 2007 05:13:35 -0700

Various parts of BackupPC spend a lot of time traversing large
trees of files, including BackupPC_dump, BackupPC_trashClean
and BackupPC_nightly.


As many people have observed, over time BackupPC's pooling results
in directories with files that are widely dispersed across the disk.
This makes disk seeks the performance bottleneck.

Currently BackupPC processes all the files in a directory by
reading the directory and processing each file in the order
returned by the directory read.

Simon Strack from Monash U noted a couple of years ago that disk
seeks can be reduced significantly by sorting the directory read
results by inode.  If numeric inode is closely correlated with the
disk position (ie: block) then the files are processed in an order
that reduces disk seeks.

Perl's built-in directory reading functions just return a file name.
The perl module IO::Dirent additionally returns the inode and file
type, which avoids a stat() on each file.

I'm interested in exploring whether IO::Dirent works with different
operating and file systems and, if so, whether traversing those file
systems by sorting inodes returned by IO::Dirent provides any benefit.

I am asking for some volunteers to do the following:

 - install IO::Dirent from CPAN.

 - unpack the attached tar file in a directory

 - make sure IO::Dirent works (ie: returns correct type and inode
   information) on the file system you will test by running the
   inodeVerify script:

        su backuppc
        mkdir TOPDIR/temp
        cd TOPDIR/temp
        inodeVerify

   It should print "IO::Dirent is ok".

   You can remove the temp directory.

 - run the inodeTest benchmark on a large directory tree
   (eg: /data/BackupPC/cpool or /data/BackupPC/cpool/0 or
   /data/BackupPC/cpool/[0-7]).  You need a large enough
   tree to render caching unimportant, eg: to do the
   entire pool:

        su backuppc
        inodeTest TOPDIR/cpool

   or one of these (1/16 of the pool, 1/4 of the pool or 1/2 of
   the pool respectively):

        inodeTest TOPDIR/cpool/0
        inodeTest TOPDIR/cpool/[0-3]
        inodeTest TOPDIR/cpool/[0-7]

   The benchmark traverses the tree and stats each file,
   first without inode sorting, and then with inode sorting.

   The pair of tests is repeated 3 times, and the first pair
   is ignored to reduce the measurement error due to caching,
   which tends to benefit the second and subsequent runs.

   If the run time on the last 4-5 runs is way shorter than
   the first then caching is dominating and you need to re-run
   with a larger tree.

   The ratio of elapsed time taken for the two non-sorted
   runs to the two sorted runs is printed.

   You should make sure the load from other usage on the file
   system is low, or at least relatively constant, during the
   test - otherwise the results won't be meaningful.

I'd like to get the following info from you: the output from the
two scripts, the OS, the file system type and raid or lvm setup.

Please email info to me off list and I will summarize.

Craig

inode.tgz
Description: GNU Unix tar archive

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

_______________________________________________
BackupPC-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

[BackupPC-users] Does ordering file access by inode improve performance?

Reply via email to