amjad ali wrote: > Hello all, > > In an mpi parallel code which of the following two is a better way: > > 1) Read the input data from input data files only by the master process > and then broadcast it other processes. > > 2) All the processes read the input data directly from input data files > (no need of broadcast from the master process). Is it possible?.
Both.... it depends on the details of course. How big are the input files? Does each client need them all, or just their fraction? If the clients read from the input files are they local to the clients or being read from a shared file system? What does your network look like? Keep in mind that when you say broadcast that many (not all) MPI implementations do not do a true network layer broadcast... and that in most situations network uplinks are distinct from the downlinks (except for the ACKs). If all clients need all input files you can achieve good performance by either using a bit torrent approach (send 1/N of the file to each of N clients then have them re-share it), or even just a simple chain. Head -> node A -> node B -> node C. This works better than you might think since Node A can start uploading immediately and the upload bandwidth doesn't compete with the download bandwidth (well not much usually). For the typical case a MPI broadcast of 1GB because 8 nodes need 128MB wouldn't be worth it. Instead just send 128MB to each client with MPI_Send. In general I see a higher percentage of peak bandwidth with MPI than I do with NFS, but NFS can be tuned to be a reasonably high fraction of wirespeed as well. Keep in mind that it's not hard to become disk limited on the head node, you might want to take a look at how you are reading the files and the bandwidth available before you go optimizing the network layer. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf