Herbert Xu wrote:

> 
> Yes, however I think the same argument could be applied to TOE.
> 
> With their RDMA NIC, we'll have TCP/SCTP connections that
> bypass netfilter, tc, IPsec, AF_PACKET/tcpdump and the rest
> of our stack while at the same time it is using the same IP
> address as us and deciding what packets we will or won't see.
> 

The whole point of the patches that opengrid has proposed is to
allow control of these issues to remain with the kernel. That is
where the ownership of the IP address logically resides, and system
administrators will expect to be able to use one set of tools to
control what is done with a given IP address.

The bypassing is already going on with iSCSI devices and with
InfiniBand devices that use IP addresses. An RDMA/IP device just
makes it harder to ignore this problem, but the problem was already
there. SDP over IB is presented to Linux users essentially as a
TOE service. Connections are made with IP and socket semantics,
and yet there is no co-ordination on routes/netfilter/etc.

I'll state right up front that I think stateful offload, when
co-ordinated with the OS, is better than stateless offload --
especially at 10G speeds.

But for plain TCP connections there are stateless offloads
available. As a product architect I am already seeking as
many ways as possible to support stateless offload as efficiently
as possible to keep that option viable for Linux users for as
high of a rate as possible. That is why we are very interested
in exploring a hardware friendly definition of vj_netchannels.

But with RDMA things are different. There is no such thing as
stateless RDMA. It is not RDMA over TCP that requires stateful
offload, it is RDMA itself. RDMA over InfiniBand is just as
much of a stateful offload as RDMA over TCP.

It is possible to build RDMA over TCP as a service that merely
uses memory mappping services in a mysterious way but is not
integrated with the network stack at all. That is essentially
how RDMA over IB is currently working.

But I believe that integrating control over the IP address,
and the associated netfilter/routing/arp/pmtu/etc issues,
is the correct path. This logic should not be duplicated,
and its control must not be split.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to