Hi Bogdan, Bogdan Costescu wrote: > > Have you considered using a parallel file system ?
We looked a bit into a few, but would love to get any input from anyone on that. What we found so far was not really convincing, e.g. glusterFS at that time was not really stable, lustre was too easy to crash - at l east at that time, ... > There have been many talks of improving performance by paying attention > to the data locality on this very list. Are you not able to move the > code to where the data is or move the data to where the code is ? In principle this *should* be possible, however then this particular user (and maybe many in the future) would need to circumvent the batch system and it's usually quite a hassle to set this up correctly beforehand. > > F.e. using a simple TCP connection (nc, rsh, rsync or even http) to > transfer the file to the local disk before using it is probably more > efficient than the way you use NFS is you deal with small files (as they > have to be written to some local storage). The setup and tear-down costs > of the NFS connection (automounter, mount, unmount) simply doesn't exist > in this case; the transfer of data on the wire happens the same way. Or > you could even get around the limitation of storing it locally by using > a ramdisk to temporarily store the files (if you have the free > memory...) - from what I understand they are read then used immediately > and not needed again in a short time frame so it makes no sense to store > them for longer, a perfect application for a tmpfs. The interesting bit is: Even with the data on a remote disk the overhead is not really that much more. The files are typically less than 100k in size, even doing an rsync or nc|tar from one box to another is REALLY slow with that many small files. tmpfs et al: The jobs usually reads the data once directly form the NFS share and processes it, it's not going back to this file again (well at least not this process). So I do think NFS would not be that bad although it won't be the optimal, but it's usually the easiest for the user to use and quite generic in the approach. Of course one could devise other and much better schemes, but you have always find a good compromise between usability and man-power needed to tailor a specific scheme. Thanks! Carsten _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf