writing to different sections of a file is probably wrong on any
networked FS, since there will inherently be obscure interactions
with the size and alignment of the writes vs client pagecache,
I'm rather surprised to see that sentiment on a mailing list for high
performance clusters :>
smiley noted, but I would suggest that HPC is not about convenience first -
simply having each node write to a separate file eliminates any such issue,
and is hardly an egregious complication to the code.
I would contend that writing to different sections of a file *must* be
supported by any file system deployed on a cluster. How else would
you get good performance from MPI-IO?
who uses MPI-IO? straight question - I don't believe any of our 1500 users do.
PVFS, GPFS, and Lustre all suppoort simultaneous writes to different
sections of a file.
NFS certainly does as well. you just have to know the constraints.
are you saying you can never get pathological or incorrect results from
parallel operations on the same file on any of those FS's?
in my experience, people who expect it to "just work" have an
incredibly naive model of how a network FS works (ie, write()
produces an RPC direct to the server)
I agree that the POSIX API and consistency semantics make it difficult
to achieve high I/O rates for common scientific workloads, and that
NFS is probably not the best solution for those truly parallel workloads.
Fortunately, there are good alternatives out there.
starting with the question: "do you have a good reason to be writing in
parallel to the same file?". I'm not saying the answer is never yes.
I guess I tend to value portability by obscurity-avoidance. not if it makes
life utter hell, of course, but...
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf