Mark Hahn wrote:
writing to different sections of a file is probably wrong on any
networked FS, since there will inherently be obscure interactions
with the size and alignment of the writes vs client pagecache,

I'm rather surprised to see that sentiment on a mailing list for high
performance clusters :>

smiley noted, but I would suggest that HPC is not about convenience first - simply having each node write to a separate file eliminates any such issue,
and is hardly an egregious complication to the code.

Actually this can greatly complicate code. If I run a CFD run on n number of
processes and they each write the solution to a separate file, then if I run
1.5*n processes, how do I read the n files? I can write some code to take the n files, and then write out a single file or 1.5*n files for instance. To me this
is a wasteful use of cycles when something like MPI-IO is so much better
and I can stick with a single file.

While I don't want to speak for the entire CFD community, but I haven't
seen anyone write out n files. That concept was proven to be a huge pain
many years ago.

Other disciplines may have other opinions of course.

I would contend that writing to different sections of a file *must* be
supported by any file system deployed on a cluster.  How else would
you get good performance from MPI-IO?

who uses MPI-IO? straight question - I don't believe any of our 1500 users do.

I do. I also know that some ISV's are moving rapidly to use MPI-IO.

in my experience, people who expect it to "just work" have an
incredibly naive model of how a network FS works (ie, write()
produces an RPC direct to the server)

I agree that the POSIX API and consistency semantics make it difficult
to achieve high I/O rates for common scientific workloads, and that
NFS is probably not the best solution for those truly parallel workloads.

Fortunately,  there are good alternatives out there.

starting with the question: "do you have a good reason to be writing in parallel to the same file?". I'm not saying the answer is never yes.

As Rob mentioned writing in parallel to the same file gets you good performance. I think this is a fundamental underpinning of parallel IO. You can do this with
or without MPI-IO. MPI-IO just makes it easier, standard, and portable.

Of course you would not have different processes writing to the same region
of a file. But if you can have each process write to a distinct region or section
of the file without worrying about having another process stepping on that
one, then why not write in parallel? It's easy to do using MPI-IO. Take a look
at the tutorials on MPI-IO around the web and give them a try.

Jeff

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to