Cc: Pavel On Fri, Jun 08, 2018 at 03:07:30AM -0700, Maciej Żenczykowski wrote: > I think we probably need to make sk->sk_reuse back into a boolean. > (ie. eliminate SK_FORCE_REUSE) > > Then add a new tcp/udp sk->ignore_bind_conflicts boolean setting... > (ie. not just for tcp, but sol_socket) [or perhaps SO_REPAIR, > sk->repair or something] > > What I'm not certain of is exactly what sorts of conflicts it should ignore... > all? probably not, still seems utterly wrong to allow creation of 2 connected > tcp sockets with identical 5-tuples.
It is required when we are restoring i_b_c sockets on a server side. In this cases, they all have the same source address of a listening socket. To restore these sockets, we need to be able to create a listening socket and all i_b_c sockets and bind them all to the same source address. BTW: Here is an example of how tcp_repair works: https://github.com/avagin/tcp-repair/blob/master/tcp-constructor.c > > Would it only ignore conflicts against other i_b_c sockets? > ie. set it on all sockets as we're repairing, then clear it on them > all once we're done? TCP_REPAIR (which is set SK_FORCE_REUSE) is used to restore only i_b_c sockets. SK_FORCE_REUSE is needed to ignore bind conflicts for repaired sockets. It ignores conflicts agains other i_b_c and listen sockets. The current idea is that CRIU will restore listening sockets first, and them it will restore i_b_c sockets. Pls, take a look at the attached patch. > > and ignore all the fast caching when checking conflicts for an i_b_c socket? > > For CRIU is it safe to assume we're restoring an entire namespace into > a new namespace? No. It isn't. CRIU can restore processes in an existing network namespace. > > Could we perhaps instead allow a new namespace to ignore bind conflicts until > we flip it into enforcing mode? No, we could not
>From 990baa56993827ae6f4441cf078eddf73389d6ee Mon Sep 17 00:00:00 2001 From: Andrei Vagin <ava...@openvz.org> Date: Fri, 8 Jun 2018 23:27:46 -0700 Subject: [PATCH] net: split sk_reuse into sk_reuse and sk_force_reuse Currently sk_reuse can have there values: SK_NO_REUSE, SK_CAN_REUSE, SK_FORCE_REUSE. SK_CAN_REUSE is set by SOL_REUSEADDR. SK_FORCE_REUSE is used to ignore bind conflicts for sockets in the repair mode. This patch makes sk->sk_reuse back into a boolean and adds sk->sk_force_reuse to track SK_FORCE_REUSE separatly. Recently here were changes which prohibit to change SO_REUSEADDR/SO_REUSEPORT on bound sockets and now it is impossible to set origin values of these parameters for restored (repaired) sockets. With introduced changes, the tcp_repair mode doesn't affect sk_reuse, so it is possible to set its value before switching a socket into the repair mode. Fixes: f396922d862a ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets") Signed-off-by: Andrei Vagin <ava...@openvz.org> --- include/net/sock.h | 13 ++++--------- net/ipv4/inet_connection_sock.c | 2 +- net/ipv4/tcp.c | 4 ++-- 3 files changed, 7 insertions(+), 12 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index b3b75419eafe..8ad19286ab9e 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -130,6 +130,7 @@ typedef __u64 __bitwise __addrpair; * @skc_family: network address family * @skc_state: Connection state * @skc_reuse: %SO_REUSEADDR setting + * @skc_force_reuse: ignore bind conflicts * @skc_reuseport: %SO_REUSEPORT setting * @skc_bound_dev_if: bound device index if != 0 * @skc_bind_node: bind hash linkage for various protocol lookup tables @@ -174,7 +175,8 @@ struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; - unsigned char skc_reuse:4; + unsigned char skc_reuse:1; + unsigned char skc_force_reuse:1; unsigned char skc_reuseport:1; unsigned char skc_ipv6only:1; unsigned char skc_net_refcnt:1; @@ -339,6 +341,7 @@ struct sock { #define sk_family __sk_common.skc_family #define sk_state __sk_common.skc_state #define sk_reuse __sk_common.skc_reuse +#define sk_force_reuse __sk_common.skc_force_reuse #define sk_reuseport __sk_common.skc_reuseport #define sk_ipv6only __sk_common.skc_ipv6only #define sk_net_refcnt __sk_common.skc_net_refcnt @@ -502,16 +505,8 @@ enum sk_pacing { #define rcu_dereference_sk_user_data(sk) rcu_dereference(__sk_user_data((sk))) #define rcu_assign_sk_user_data(sk, ptr) rcu_assign_pointer(__sk_user_data((sk)), ptr) -/* - * SK_CAN_REUSE and SK_NO_REUSE on a socket mean that the socket is OK - * or not whether his port will be reused by someone else. SK_FORCE_REUSE - * on a socket means that the socket will reuse everybody else's port - * without looking at the other's sk_reuse value. - */ - #define SK_NO_REUSE 0 #define SK_CAN_REUSE 1 -#define SK_FORCE_REUSE 2 int sk_set_peek_off(struct sock *sk, int val); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 33a88e045efd..2ac1c591b60c 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -306,7 +306,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) goto fail_unlock; tb_found: if (!hlist_empty(&tb->owners)) { - if (sk->sk_reuse == SK_FORCE_REUSE) + if (sk->sk_force_reuse) goto success; if ((tb->fastreuse > 0 && reuse) || diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 2741953adaba..70bfdd5a2fc4 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2810,11 +2810,11 @@ static int do_tcp_setsockopt(struct sock *sk, int level, err = -EPERM; else if (val == 1) { tp->repair = 1; - sk->sk_reuse = SK_FORCE_REUSE; + sk->sk_force_reuse = 1; tp->repair_queue = TCP_NO_QUEUE; } else if (val == 0) { tp->repair = 0; - sk->sk_reuse = SK_NO_REUSE; + sk->sk_force_reuse = 0; tcp_send_window_probe(sk); } else err = -EINVAL; -- 2.17.0