On Mon, Dec 11, 2006 at 05:53:58PM -0800, Bill Broadley wrote: > Lustre: > * client server > * scales extremely well, seems popular on the largest of clusters. > * Can survive hardware failures assuming more than 1 block server is > connected to each set of disks > * unix only. > * relatively complex. > > PVFS2: > * Client server > * scales well > * can not survive a block server death. > * unix only > * relatively simple. > * designed for use within a cluster.
Hi Bill As a member of the PVFS project I just wanted to comment on your description of our file system. I would say that PVFS is every bit as fault tolerant as Lustre. The redundancy model for the two file systems are pretty simliar: both file systems rely on shared storage and high availability software to continute operating in the face of disk failure. What Lustre has done a much better job of than we have is documenting the HA process. This is one of our (PVFS) areas of focus in the near-term. We may not have documented the process in enough detail, but one can definitely set up PVFS servers with links to shared storage and make use of things like IP takeover to deliver resiliancy in the face of disk failure, and have had this ability for several years now (PVFS users can check out 'pvfs2-ha.pdf' in our source for a starting point). > So the end result (from my skewed perspective) is: > * Lustre and PVFS2 are popular in clusters for sharing files in larger > clusters where more than single file server worth of bandwidth is > required. Both I believe scale well with bandwidth but only allow > for a single metadata server so will ultimately scale only as far > as single machine for metadata intensive workloads (such as lock > intensive, directory intensive, or file creation/deletion > intensive workloads). Granted this also allows for exotic > hardware solutions (like solid state storage) if you really need > the performance. PVFS v2 has offered multiple metadata servers for some time now. Our metadata operations scale well with the number of metadata servers. You are absolutely correct that PVFS metadata performance is dependant on hardware, but you need not get so exotic as solid state to see high metadata rates. The OSC PVFS deployment has servers with RAID and fast disks, and can deliver quite high metadata rates. Another point I'd like to make about PVFS is how well-suited it is for MPI-IO applications. The ROMIO MPI-IO implementation (the basis for many MPI-IO implementations) contains a highly-efficent PVFS driver. This driver speaks directly to PVFS servers, bypassing the kernel. It also contains optimizations for collective metadata operations and noncontiguous I/O. Applications making use of MPI-IO, or higher-level libraries built on top of MPI-IO such as parallel-netcdf or (when configured correctly) HDF5 are likely to see quite good performance when running on PVFS. > Hopefully others will expand and correct the above. Happy to do so! ==rob -- Rob Latham Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf