On Fri, May 08, 2020 at 04:42:23PM -0700, Maciej Żenczykowski wrote: > From: Maciej Żenczykowski <m...@google.com> > > This makes 'ping' 'ping6' and icmp based traceroute no longer > require any suid or file capabilities. > > These sockets have baked long enough that the restriction > to make them unusable by default is no longer necessary. > > The concerns were around exploits. However there are now > major distros that default to enabling this. > > This is already the default on Fedora 31: > [root@f31vm ~]# cat /proc/sys/net/ipv4/ping_group_range > 0 2147483647 > [root@f31vm ~]# cat /usr/lib/sysctl.d/50-default.conf | egrep -B6 > ping_group_range > # ping(8) without CAP_NET_ADMIN and CAP_NET_RAW > # The upper limit is set to 2^31-1. Values greater than that get rejected by > # the kernel because of this definition in linux/include/net/ping.h: > # #define GID_T_MAX (((gid_t)~0U) >> 1) > # That's not so bad because values between 2^31 and 2^32-1 are reserved on > # systemd-based systems anyway: https://systemd.io/UIDS-GIDS.html#summary > -net.ipv4.ping_group_range = 0 2147483647 > > And in general is super useful for any network namespace container > based setup. See for example: > https://docs.docker.com/engine/security/rootless/ > > This is one less thing you need to configure when you creare a new network > namespace. > > Before: > vm:~# unshare -n > vm:~# cat /proc/sys/net/ipv4/ping_group_range > 1 0 > > After: > vm:~# unshare -n > vm:~# cat /proc/sys/net/ipv4/ping_group_range > 0 2147483647 > > Signed-off-by: Maciej Żenczykowski <m...@google.com>
This will unfortunately cause regressions with VRFs because they don't work correctly with ping sockets. Simple example: ip link add name vrf-red up type vrf table 10 ip link add name vrf-blue up type vrf table 20 ip rule add pref 32765 table local ip rule del pref 0 ip link add name veth-red type veth peer name veth-blue ip link set dev veth-red up master vrf-red ip link set dev veth-blue up master vrf-red ip address add 192.0.2.1/24 dev veth-red ip address add 192.0.2.2/24 dev veth-blue ip vrf exec vrf-red ping -I 192.0.2.1 192.0.2.2 -c 1 -w 1 Before the patch: PING 192.0.2.2 (192.0.2.2) from 192.0.2.1 : 56(84) bytes of data. 64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.053 ms After the patch: bind: Cannot assign requested address This specific test case is fixed by following patch, but at least a similar change is needed for IPv6. Other changes might also be required. Sorry for not providing a complete solution, but I'm busy with other tasks at the moment. :/ diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 535427292194..8463b0e9e811 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -297,6 +297,7 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, struct net *net = sock_net(sk); if (sk->sk_family == AF_INET) { struct sockaddr_in *addr = (struct sockaddr_in *) uaddr; + u32 tb_id = RT_TABLE_LOCAL; int chk_addr_ret; if (addr_len < sizeof(*addr)) @@ -310,7 +311,15 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, pr_debug("ping_check_bind_addr(sk=%p,addr=%pI4,port=%d)\n", sk, &addr->sin_addr.s_addr, ntohs(addr->sin_port)); - chk_addr_ret = inet_addr_type(net, addr->sin_addr.s_addr); + if (sk->sk_bound_dev_if) { + tb_id = l3mdev_fib_table_by_index(net, + sk->sk_bound_dev_if); + if (!tb_id) + tb_id = RT_TABLE_LOCAL; + } + + chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, + tb_id); if (addr->sin_addr.s_addr == htonl(INADDR_ANY)) chk_addr_ret = RTN_LOCAL; > --- > net/ipv4/af_inet.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index cf58e29cf746..1a8cb6f3ee38 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1819,12 +1819,8 @@ static __net_init int inet_init_net(struct net *net) > net->ipv4.ip_local_ports.range[1] = 60999; > > seqlock_init(&net->ipv4.ping_group_range.lock); > - /* > - * Sane defaults - nobody may create ping sockets. > - * Boot scripts should set this to distro-specific group. > - */ > - net->ipv4.ping_group_range.range[0] = make_kgid(&init_user_ns, 1); > - net->ipv4.ping_group_range.range[1] = make_kgid(&init_user_ns, 0); > + net->ipv4.ping_group_range.range[0] = GLOBAL_ROOT_GID; > + net->ipv4.ping_group_range.range[1] = KGIDT_INIT(0x7FFFFFFF); > > /* Default values for sysctl-controlled parameters. > * We set them here, in case sysctl is not compiled. > -- > 2.26.2.645.ge9eca65c58-goog >