Thanks for the thoughts, Evgeniy ! Well... I was thinking in developing something like that (nothing actually very usefull, just a little "something" to get me more comfortable with fs and net development):
1) Inside a gigabit LAN there will be, let's say, 10 machines, that are meant to be used as filesystem nodes. Those machines have a daemon running in userspace ( "dfsd" ) and have one or more partitions of physical(s) HD(s) dedicated to the "filesystem cluster". So, let's suppose that on every node we have a /dev/hdb5 with 20GB unused, dedicated to the cluster ( "/usr/bin/dfsd -p /dev/hda5" ). This is to keep things simple (since we can have raw access to the partition), but we could use files on the local filesystem too. 2) On the master machine, the DFS kernel module (which declares a block device like /dev/dfs1) uses broadcast packages (something like DHCP) to retrieve the list of active nodes on the LAN. So, with 10 machines with 20GB each, we have 200GB of distributed storage over the network. To keep things simple, let's say that they are addressed in a serial fashion (requests from 0-20GB goes to the node1, 20-40GB to node2, etc). The module is responsible for keeping a pool of TCP connections with the nodes' daemons, for sending, receiving and parsing the data, etc. At this point, no security measures are taken (encryption, etc). At this point, I think that we should be able to create a reiserfs fs on the device and get it running (even if far slower than a local disk). The second part of the project, which would involve more serious stuff, could be: 3) Redundancy - minimizing the storage capacity, but being able to safely continue to work if one of the nodes are down. Actually I don't have any clue on how to achieve this without drastically diminish the storage capacity, but probably there is some clever way out there =] 4) No masters - each node can have access to the filesystem (the block device) as if it was a NFS mountpoint (this could be useful somehow to actual clusters, where you could not only share the processor, but also the HD of the nodes as a single huge / mountpoint). In this model, there would be no userspace stuff at all. What you think ? Daniel On 6/30/06, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
On Fri, Jun 30, 2006 at 03:32:28AM -0400, Daniel Bonekeeper ([EMAIL PROTECTED]) wrote: > Let's suppose that I'm writing a experimental distributed filesystem > that needs to open a TCP socket to another machines on the LAN, keep a > pool of connections and be always aware of new data arriving (like a > userspace select()). What's the best approach to implement this ? Is > it better to keep all the TCP socket stuff in userspace and use an > interface like netlink to talk with it ? Or, since we're talking about > a filesystem (where performance is a must), is it better to keep it in > kernel mode ? It depends on your design. NFS uses in-kernel sockets, but userspace can easily fill 1Gbit link too. FS must eliminate as much coping as possible, but without deep digging into the socket code you will get copy both in kernelspace (one copy from socket queue into you buffer) and userspace (the same copy, but using slower copy_to_user(), depending on the size of each copy it can make noticeble difference), but with kernel socket you get your data in the fs/vfs cache already, but with userspace you must copy it back into the kernel using slow copy_from_user(), but if data is supposed to be somehow (heavily) processed before reaching the harddrive (for example compressed or encrypted), cost of processing can fully hide cost of the copy itself, so userspace is much more preferable in that situation due to it's much more convenient development process. -- Evgeniy Polyakov
-- What this world needs is a good five-dollar plasma weapon. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html