Re: Sockets inside the kernel or userspace ?

Daniel Bonekeeper Fri, 30 Jun 2006 01:47:27 -0700

Thanks for the thoughts, Evgeniy !

Well... I was thinking in developing something like that (nothing
actually very usefull, just a little "something" to get me more
comfortable with fs and net development):


1) Inside a gigabit LAN there will be, let's say, 10 machines, that
are meant to be used as filesystem nodes. Those machines have a daemon
running in userspace ( "dfsd" ) and have one or more partitions of
physical(s) HD(s) dedicated to the "filesystem cluster". So, let's
suppose that on every node we have a /dev/hdb5 with 20GB unused,
dedicated to the cluster ( "/usr/bin/dfsd -p /dev/hda5" ). This is to
keep things simple (since we can have raw access to the partition),
but we could use files on the local filesystem too.

2) On the master machine, the DFS kernel module (which declares a
block device like /dev/dfs1) uses broadcast packages (something like
DHCP) to retrieve the list of active nodes on the LAN. So, with 10
machines with 20GB each, we have 200GB of distributed storage over the
network. To keep things simple, let's say that they are addressed in a
serial fashion (requests from 0-20GB goes to the node1, 20-40GB to
node2, etc). The module is responsible for keeping a pool of TCP
connections with the nodes' daemons, for sending, receiving and
parsing the data, etc. At this point, no security measures are taken
(encryption, etc).

At this point, I think that we should be able to create a reiserfs fs
on the device and get it running (even if far slower than a local
disk). The second part of the project, which would involve more
serious stuff, could be:

3) Redundancy - minimizing the storage capacity, but being able to
safely continue to work if one of the nodes are down. Actually I don't
have any clue on how to achieve this without drastically diminish the
storage capacity, but probably there is some clever way out there =]

4) No masters - each node can have access to the filesystem (the block
device) as if it was a NFS mountpoint (this could be useful somehow to
actual clusters, where you could not only share the processor, but
also the HD of the nodes as a single huge / mountpoint). In this
model, there would be no userspace stuff at all.

What you think ?

Daniel


On 6/30/06, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

On Fri, Jun 30, 2006 at 03:32:28AM -0400, Daniel Bonekeeper ([EMAIL PROTECTED]) 
wrote:
> Let's suppose that I'm writing a experimental distributed filesystem
> that needs to open a TCP socket to another machines on the LAN, keep a
> pool of connections and be always aware of new data arriving (like a
> userspace select()). What's the best approach to implement this ? Is
> it better to keep all the TCP socket stuff in userspace and use an
> interface like netlink to talk with it ? Or, since we're talking about
> a filesystem (where performance is a must), is it better to keep it in
> kernel mode ?

It depends on your design.
NFS uses in-kernel sockets, but userspace can easily fill 1Gbit link too.
FS must eliminate as much coping as possible, but without deep digging
into the socket code you will get copy both in kernelspace (one copy
from socket queue into you buffer) and userspace (the same copy, but
using slower copy_to_user(), depending on the size of each copy it can
make noticeble difference), but with kernel socket you get your data
in the fs/vfs cache already, but with userspace you must copy it back
into the kernel using slow copy_from_user(), but if data is supposed to
be somehow (heavily) processed before reaching the harddrive (for
example compressed or encrypted), cost of processing can fully hide cost
of the copy itself, so userspace is much more preferable in that
situation due to it's much more convenient development process.

--
        Evgeniy Polyakov



--
What this world needs is a good five-dollar plasma weapon.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Sockets inside the kernel or userspace ?

Reply via email to