Hi Tom, > I want to ask this general question: how does your shop deal with the > general problem of > small files in filesystems on (beowulf) compute clusters?
We have this workload in spades. As others have mentioned, good user education is the key. We use inode quotas on lustre (typically 150k -> 1M per user), to act as a safety net to catch code that wants to generate billions of small files before it poisons the filesystem. We encourage people to hash files into nested directories to limit the number of files in a single directory. (depressingly enough, we spent the first part of this week tidying up an episode of millions-of-files-in-a-directory, caused by our batch queueing system of all things...) You should also ensure that the small files are not striped across multiple OSTs; that really hurts performance. We set our filesystem default to be "don't stripe". Once you've done all of that, we find that small file performance is reasonable. (ie fast enough that people are not actively complaining.) Cheers, Guy -- Dr. Guy Coates, Informatics Systems Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf