On Wed, Mar 16, 2016 at 6:19 AM, Gilberto Bertin <gilberto.ber...@gmail.com> wrote: > This is my second attempt to submit an RFC for this patch. > > Some arguments for and against it since the first submission: > * SO_BINDTOSUBNET is an arbitrary option and can be seens as nother use > * case of the SO_REUSEPORT BPF patch > * but at the same time using BPF requires more work/code on the server > and since the bind to subnet use case could potentially become a > common one maybe there is some value in having it as an option instead > of having to code (either manually or with clang) an eBPF program that > would do the same
Gilberto, I'm not sure I understand this argument. Have you implemented the BPF bind solution? Thanks, Tom > * it may probably possible to archive the same results using VRF. This > would require to create a VRF device, configure the device routing > table and make each bind each process to a different VRF device (but > I'm not sure how this would work/interfere with an existing iptables > setup for example) > > ----------------------------------------------------------------------------- > > This series introduces support for the SO_BINDTOSUBNET socket option, which > allows a listener socket to bind to a subnet instead of * or a single address. > > Motivation: > consider a set of servers, each one with thousands and thousands of IP > addresses. Since assigning /32 or /128 IP individual addresses would be > inefficient, one solution can be assigning subnets using local routes > (with 'ip route add local'). > > This allows a listener to listen and terminate connections going to any > of the IP addresses of these subnets without explicitly configuring all > the IP addresses of the subnet range. > This is very efficient. > > Unfortunately there may be the need to use different subnets for > different purposes. > One can imagine port 80 being served by one HTTP server for some IP > subnet, while another server used for another subnet. > Right now Linux does not allow this. > It is either possible to bind to *, indicating ALL traffic going to > given port, or to individual IP addresses. > The first only allows to accept connections from all the subnets. > The latter does not scale well with lots of IP addresses. > > Using bindtosubnet would solve this problem: just by adding a local > route rule and setting the SO_BINDTOSUBNET option for a socket it would > be possible to easily partition traffic by subnets. > > API: > the subnet is specified (as argument of the setsockopt syscall) by the > address of the network, and the prefix length of the netmask. > > IPv4: > struct ipv4_subnet { > __be32 net; > u_char plen; > }; > > and IPv6: > struct ipv6_subnet { > struct in6_addr net; > u_char plen; > }; > > Bind conflicts: > two sockets with the bindtosubnet option enabled generate a bind > conflict if their network addresses masked with the shortest of their > prefix are equal. > The bindtosubnet option can be combined with soreuseport so that two > listener can bind on the same subnet. > > Any questions/feedback appreciated. > > Thanks, > Gilberto > > Gilberto Bertin (4): > bindtosubnet: infrastructure > bindtosubnet: TCP/IPv4 implementation > bindtosubnet: TCP/IPv6 implementation > bindtosubnet: UPD implementation > > include/net/sock.h | 20 +++++++ > include/uapi/asm-generic/socket.h | 1 + > net/core/sock.c | 111 > ++++++++++++++++++++++++++++++++++++++ > net/ipv4/inet_connection_sock.c | 20 ++++++- > net/ipv4/inet_hashtables.c | 9 ++++ > net/ipv4/udp.c | 36 +++++++++++++ > net/ipv6/inet6_connection_sock.c | 17 +++++- > net/ipv6/inet6_hashtables.c | 6 +++ > 8 files changed, 218 insertions(+), 2 deletions(-) > > -- > 2.7.2 >