On Tue, Mar 06, 2007 at 11:09:18AM -0500, Mark Hahn wrote: > >I would contend that writing to different sections of a file *must* be > >supported by any file system deployed on a cluster. How else would > >you get good performance from MPI-IO? > > who uses MPI-IO? straight question - I don't believe any of our 1500 users > do.
Excellent question. Direct users? Probably not very many. We do find that straight-up MPI-IO isn't a good fit for a lot of scientific applications. The convienence factor you mentioned is indeed important. MPI-IO thinks of data as "stream of bytes", while applications think in terms of "multidimentional typed data" (a slice of upper atmosphere). Libraries like Parallel-HDF5 and Parallel-NetCDF bridge the gap and provide a convienent, familiar API. The app is still using MPI-IO, just not directly. > NFS certainly does as well. you just have to know the constraints. > are you saying you can never get pathological or incorrect results from > parallel operations on the same file on any of those FS's? You observe correctly that file systems offer a set of rules on what to expect from I/O patterns. These consistency semantics are not set in stone: MPI-IO consistency semantics are more relaxed than POSIX, yet generally sufficent for parallel scientific applicaitons. We would consider it a serious bug in PVFS if simultaneous non-overlapping writes corrupted data. If the only file system I had access to was NFS, I'd do one file per process as well. > starting with the question: "do you have a good reason to be writing in > parallel to the same file?". I'm not saying the answer is never yes. > > I guess I tend to value portability by obscurity-avoidance. not if it makes > life utter hell, of course, but... one file per processor falls down on systems like BGL (where even a small run is 1024 processes, and 128k is not unheard of). One file per process also robs the higher layers of the I/O software stack from an opportunity to optimize access patterns. All processes reading a collumn out of a row-major array is noncontiguous (and generally slow) in file-per-processor, but can be contiguous in single-file after applying data shipping or two-phase collective buffering optimizations. Jeff touched on the data management issues of file-per-processor. If file-per-processor really is the most portable and convienent way to work on data, well, I can't argue with that. On NFS, that's probably the only way to get correct results. The single-file approach, however, has significant benefits on the modern parallel file systems available today. As I hope you could tell, this kind of discussion is a lot of fun for me. Thanks! ==rob -- Rob Latham Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf