Hi,

I've written a document to lay out what I'd like to do to make an AF_RXRPC
network protocol for Linux, with intent to replacing the current net/rxrpc/ in
the kernel and also providing an RxRPC transport for userspace.

Can you look it over and see what you think?

Also, is it feasible to add a new socket type?  I'm thinking of adding
SOCK_RPC to be used rathet than SOCK_DGRAM, SOCK_STREAM or whatever, since the
usage model doesn't fit the ones the currently exist.

Thanks!
David


                           =========================
                           AF_RXRPC NETWORK PROTOCOL
                           =========================

RxRPC is in essence a two-part protocol.  There is a session layer which
provides reliable virtual connections using UDP over IPv4 or IPv6 as the
transport layer, but implements a real network protocol, and there's the
presentation layer which renders structured data to binary blobs and back again
using XDR (as does SunRPC):

                +-------------+
                | Application |
                +-------------+
                |     XDR     |         Presentation
                +-------------+
                |    RxRPC    |         Session
                +-------------+
                |     UDP     |         Transport
                +-------------+

(Very OSI, I know, and probably wrong).


AF_RXRPC would provide:

 (1) Part of an RxRPC facility for both kernel and userspace applications by
     making the session part of it a Linux network protocol (AF_RXRPC).

 (2) A two-phase protocol.  The client transmits a blob and then receives a
     blob, and the server receives a blob and then transmits a blob.

 (3) Retention of the reusable bits of the transport system set up for one call
     to speed up subsequent calls.

 (4) A secure protocol, using the Linux kernel's key retention facility to
     manage security on the client end.  The server end must of necessity be
     more active in security negotiations.

AF_RXRPC would not provide XDR marshalling facilities.  That would be left to
the application.


Sockets of AF_RXRPC family would be:

 (1) created as type SOCK_RPC;

 (2) provided with a protocol of the type of underlying transport they're going
     to use - currently only PF_INET and PF_INET6 are supported.


The Andrew File System (AFS) is an example of an application that uses this and
that has both kernel (filesystem) and userspace (utility) components.


=====================
PROTOCOL DRIVER MODEL
=====================

An overview of the RxRPC protocol:

 (*) RxRPC sits on top of another networking protocol (UDP is the only option
     currently), and uses this to provide network transport.  UDP ports, for
     example, provide transport endpoints.

 (*) RxRPC supports multiple virtual "connections" from any given transport
     endpoint, thus allowing the endpoints to be shared, even to the same
     remote endpoint.

 (*) Each connection goes to a particular "service".  A connection may not go
     to multiple services.  A service may be considered the RxRPC equivalent of
     a port number.

 (*) Client-originating packets are marked, thus a transport endpoint can be
     shared between client and server connections (connections have a
     direction).

 (*) Up to about a billion connections may be supported concurrently between
     one local transport endpoint and one service on one remote endpoint.  An
     RxRPC connection is described by seven numbers:

        Local address   }
        Local port      } Transport (UDP) address
        Remote address  }
        Remote port     }
        Direction
        Connection ID
        Service ID

 (*) Each RxRPC operation is a "call".  A connection may make up to four
     billion calls, but only up to four calls may be in progress on a
     connection at any one time.

 (*) Calls are two-phase and asymmetric: the client sends its request data,
     which the service receives; then the service sends the reply data which
     the client receives.

 (*) The data are of indefinite size, the end of a phase is marked with a flag
     in the packet.

 (*) The first four bytes of the request data are the service operation ID.

 (*) Security is handled on a per-connection basis.  The connection is
     initiated by the first data packet on it arriving.  If security is
     requested, the server then issues a "challenge" and then the client
     replies with a "response".  If the response is successful, the security is
     set for the lifetime of that connection, and all subsequent calls made
     upon it use that same security.


About the AF_RXRPC driver:

 (*) The AF_RXRPC protocol would transparently use internal sockets of the
     transport protocol to represent transport endpoints.

 (*) AF_RXRPC sockets map onto RxRPC calls, not RxRPC connections.  RxRPC
     connections would also be handled transparently.

 (*) Additional parallel client connections would be initiated to support extra
     concurrent calls, up to a limit [tunable].

 (*) Each connection would be retained for a certain amount of time [tunable]
     after the last call currently using it has completed, in case a new call
     is made that could use it.

 (*) Each internal UDP socket would be retained [tunable] for a certain amount
     of time [tunable] after the last connection using it discarded, in case a
     new connection is made that could use it.

 (*) A client-side connection could only be shared between calls if they have
     have the same key struct describing their security (and assuming the calls
     would otherwise share the connection).  Non-secured calls would also be
     able to share connections with each other.

 (*) ACK'ing would be handled by the protocol driver automatically, including
     ping replying.

 (*) SO_KEEPALIVE would automatically ping the other side.


Interaction with the user of the RxRPC socket:

 (*) In the client, sending a request would be achieved with one or more
     sendmsgs, followed by the reply received with one or more recvmsgs.

 (*) Once the client has received the last bit of the reply with recvmsg, the
     socket would be again available to send a new call with sendmsg.

 (*) In the server, receiving a request would be achieved with one or more
     recvmsgs, followed by the reply transmitted with one or more sendmsgs.

     (*) The server could invoke a final recvmsg to pick up the success or
         failure of the reply reception.

     (*) The server could ACK the receipt of the request phase by doing an
         sendmsg() with a special control message if the request is going to
         take a long time to process.  Normally the first packet of the reply
         suffices to ACK the entire request.

 (*) Switching from sendmsg() to recvmsg() or vice versa would shift the state
     of the RPC operation, giving a final ACK on that phase of the protocol.

 (*) select() and poll() would show a socket as being writable if sendmsg() can
     be used to send a request or a reply, and readable if recvmsg() can be
     used to receive a request or a reply.  It would not be both readable and
     writable simultaneously.

 (*) The control data part of the msghdr struct would be used for a number of
     things:

     (*) Sending or receiving errors (aborts).

     (*) Sending ping requests and receiving ping replies.

     (*) Sending debug requests and receiving debug replies.

 (*) The server would have to assist in the setting up of security.  The server
     sends a challenge packet to the client and receives a response packet.


====================
EXAMPLE CLIENT USAGE
====================

A client would issue an operation by:

 (1) An RxRPC socket would be set up by:

        client = socket(AF_RXRPC, SOCK_RPC, PF_INET);

     Where the third parameter indicates the address type of the transport
     socket used - usually IPv4.

 (2) A local address could optionally be bound:

        struct sockaddr_rxrpc srx = {
                .srx_family     = AF_RXRPC,
                .srx_service    = 0,  /* we're a client */
                .transport_type = SOCK_DGRAM,   /* type of transport socket */
                .transport.sin_family   = AF_INET,
                .transport.sin_port     = htons(7000), /* AFS callback */
                .transport.sin_address  = 0,  /* all local interfaces */
        };
        bind(client, &srx, sizeof(srx));

     This would specify the local UDP port to be used.  If not given, a random
     non-privileged port would be used.  A UDP port may be shared between
     several unrelated RxRPC sockets.  Security is handled on a basis of
     per-RxRPC virtual connection.

 (3) The security would be set:

        const char *key = "AFS:cambridge.redhat.com";
        setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key));

     This would issue a request_key() to get the security context.

 (4) The server would then be contacted:

        struct sockaddr_rxrpc srx = {
                .srx_family     = AF_RXRPC,
                .srx_service    = VL_SERVICE_ID,
                .transport_type = SOCK_DGRAM,   /* type of transport socket */
                .transport.sin_family   = AF_INET,
                .transport.sin_port     = htons(7005), /* AFS volume manager */
                .transport.sin_address  = ...,
        };
        connect(client, &srx, sizeof(srx));

 (5) The request would be sent:

        sendmsg(client, msg, 0);

 (6) And then the reply received:

        recvmsg(client, msg, 0);

     If an abort/error was returned by the server, this will be returned in the
     control data buffer.

 (7) Then the socket would be closed or used to make another call.


====================
EXAMPLE SERVER USAGE
====================

A server would accept operations by:

 (1) An RxRPC socket would be set up by:

        server = socket(AF_RXRPC, SOCK_RPC, PF_INET);

     Where the third parameter indicates the address type of the transport
     socket used - usually IPv4.

 (2) A local address would be bound:

        struct sockaddr_rxrpc srx = {
                .srx_family     = AF_RXRPC,
                .srx_service    = VL_SERVICE_ID, /* RxRPC service ID */
                .transport_type = SOCK_DGRAM,   /* type of transport socket */
                .transport.sin_family   = AF_INET,
                .transport.sin_port     = htons(7000), /* AFS callback */
                .transport.sin_address  = 0,  /* all local interfaces */
        };
        bind(server, &srx, sizeof(srx));

 (3) The server would then listen out for incoming calls:

        listen(server, 100);

 (4) It would accept calls that were made:

        struct sockaddr_rxrpc srx;
        socken_t slen = sizeof(srx)
        call = accept(server, &src, &slen);

 (5) The first data packet would then be received:

        recvmsg(call, msg, 0);

     A connection is discovered on the server by reception of the first data
     packet holding its connection ID.  Only then can security be set up.

 (6) The security context might need to be set up:

     (a) The security index can be examined:

        uint16_t sectype;
        socklen_t len = sizeof(sectype);
        getsockopt(call, SOL_RXRPC, RXRPC_GET_SECURITY_INDEX, &sectype, &len);

     (b) A security challenge can be made:

        sendmsg(call, msg, 0);

         The control message will contain the challenge; there would be no
         data.

     (c) And the security response received:

        recvmsg(call, msg, 0);

         The control message will contain the response; there would be no data.

     (d) The security context can then be set:

        setsockopt(call, SOL_RXRPC, RXRPC_SET_SECURITY, buffer, buflen);

     If the virtual RxRPC connection already has security set up, the
     getsockopt will indicate this, and steps (b) to (d) can be skipped.

     A security rejection would be achieved simply by closing the socket before
     step (d).

 (7) The data could then be received:

        recvmsg(call, msg, 0);

 (8) And then the reply transmitted:

        sendmsg(client, msg, 0);

     If an abort/error is to be served instead, that would be placed in the
     control data, and no data would be attached.

 (9) Then the socket would be closed.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to