Tom, >> Hi Tom, >> >> >> Add the possibility to remove one or more consecutive TLVs without >> >> messing up the alignment of others. For now, only IOAM requires this >> >> behavior. >> >> >> > Hi Justin, >> > >> > Can you explain the motivation for this? Per RFC8200, extension >> > headers in flight are not to be added, removed, or modified outside of >> > the standard rules for processing modifiable HBH and DO TLVs., that >> > would include adding and removing TLVs in EH. One obvious problem this >> >> As you already know from our last meeting, IOAM may be configured on a node >> such >> that a specific IOAM namespace should be removed. Therefore, this patch >> provides support for the deletion of a TLV (or consecutive TLVs), without >> removing the entire EH (if it's empty, there will be padding). Note that >> there >> is a similar "problem" with the Incremental Trace where you'd need to expand >> the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is >> against modification of in-flight EHs, but there are several reasons that, I >> believe, mitigates this statement. >> >> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not >> widely >> deployed on the Internet. We can distinguish two big scenarios: (i) >> in-transit >> traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the >> domain, ie from an IOAM node inside the domain to another one (no need for >> encapsulation). In both cases, we kind of own the traffic: (i) encapsulation, >> so we modify "our" header and (ii) we already own the traffic. >> >> And if someone is still angry about this, well, the good news is that such >> modification can be avoided most of the time. Indeed, operators are advised >> to >> remove an IOAM namespace only on egress nodes. This way, the destination >> (either the tunnel destination or the real destination, depending on the >> scenario) will receive EHs and take care of them without the need to remove >> anything. But, again, operators can do what they want and I'd tend to adhere >> to >> David's philosophy [1] and give them the possibility to choose what to do. >> > > Justin, > > 6man WG has had a _long_ and sometimes bitter discussion around this > particularly with regards to insertion of SRH. The current consensus > of IETF is that it is a violation of RFC8200. We've heard all the > arguments that it's only for limited domains and narrow use cases, > nevertheless there are several problems that the header > insertion/deletion advocates never answered-- it breaks AH, it breaks > PMTU discovery, it breaks ICMP. There is also a risk that a > non-standard modification could cause a packet to be dropped > downstream from the node that modifies it. There is no attribution on > who created the problem, and hence this can lead to systematic > blackholes which are the most miserable sort of problem to debug.
Yes, I know the whole story and it's been stormy from what I understood. > Fundamentally, it is not robust per Postel's law (I actually wrote a > draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if > you're interested). Interesting, I'll take a look. > IMO, we shouldn't be using Linux as a backdoor to implement protocol > that IETF is saying isn't robust. Can you point out in the IOAM drafts > where this requirement is specified, then I can take it up in IOAM WG > or 6man if needed... Well, I wouldn't say that IETF is considering IPv6-IOAM as not robust since [1] (IPv6 encapsulation for IOAM) and [2] (IOAM data fields) are about to be published. Justin [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01 [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09 > Tom > >> > creates is that it breaks AH if the TLVs are removed in HBH before AH >> > is processed (AH is processed after HBH). >> >> Correct. But I don't think it should prevent us from having IOAM in the >> kernel. >> Again, operators could simply apply IOAM on a subset of the traffic that does >> not include AHs, for example. >> >> Justin >> >> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html >> >> > Tom >> >> By default, an 8-octet boundary is automatically assumed. This is the >> >> price to pay (at most a useless 4-octet padding) to make sure everything >> >> is still aligned after the removal. >> >> >> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n >> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension >> >> header. >> >> >> >> Example 1: >> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Next header | Hdr Ext Len | X | X | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | X | X | Padding | Padding | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | | >> >> ~ Option to be removed (8 octets) ~ >> >> | | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Y | Y | Y | Y | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Padding | Padding | Padding | Padding | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Z | Z | Z | Z | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Z | Z | Z | Z | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> >> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet >> >> boundary (same result in both cases). >> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Next header | Hdr Ext Len | X | X | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | X | X | Padding | Padding | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Y | Y | Y | Y | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Padding | Padding | Padding | Padding | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Z | Z | Z | Z | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Z | Z | Z | Z | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> >> >> Example 2: >> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Next header | Hdr Ext Len | X | X | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | X | X | Padding | Padding | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Option to be removed (4 octets) | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Y | Y | Y | Y | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Z | Z | Z | Z | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Z | Z | Z | Z | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> >> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z >> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple >> >> of 8 anymore. >> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Next header | Hdr Ext Len | X | X | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | X | X | Padding | Padding | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Y | Y | Y | Y | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Z | Z | Z | Z | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> | Z | Z | Z | Z | >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> >> >> >> Therefore, the largest (8-octet) boundary is assumed by default and for >> >> all, which means that blocks are only moved in multiples of 8. This >> >> assertion guarantees good alignment. >> >> >> >> Signed-off-by: Justin Iurman <justin.iur...@uliege.be> >> >> --- >> >> net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++--------- >> >> 1 file changed, 108 insertions(+), 26 deletions(-) >> >> >> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c >> >> index e9b366994475..f27ab3bf2e0c 100644 >> >> --- a/net/ipv6/exthdrs.c >> >> +++ b/net/ipv6/exthdrs.c >> >> @@ -52,17 +52,27 @@ >> >> >> >> #include <linux/uaccess.h> >> >> >> >> -/* >> >> - * Parsing tlv encoded headers. >> >> +/* States for TLV parsing functions. */ >> >> + >> >> +enum { >> >> + TLV_ACCEPT, >> >> + TLV_REJECT, >> >> + TLV_REMOVE, >> >> + __TLV_MAX >> >> +}; >> >> + >> >> +/* Parsing TLV encoded headers. >> >> * >> >> - * Parsing function "func" returns true, if parsing succeed >> >> - * and false, if it failed. >> >> - * It MUST NOT touch skb->h. >> >> + * Parsing function "func" returns either: >> >> + * - TLV_ACCEPT if parsing succeeds >> >> + * - TLV_REJECT if parsing fails >> >> + * - TLV_REMOVE if TLV must be removed >> >> + * It MUST NOT touch skb->h. >> >> */ >> >> >> >> struct tlvtype_proc { >> >> int type; >> >> - bool (*func)(struct sk_buff *skb, int offset); >> >> + int (*func)(struct sk_buff *skb, int offset); >> >> }; >> >> >> >> /********************* >> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, >> >> int >> >> optoff, >> >> return false; >> >> } >> >> >> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths >> >> */ >> >> + >> >> +static int remove_tlv(int start, int end, struct sk_buff *skb) >> >> +{ >> >> + int len = end - start; >> >> + int padlen = len % 8; >> >> + unsigned char *h; >> >> + int rlen, off; >> >> + u16 pl_len; >> >> + >> >> + rlen = len - padlen; >> >> + if (rlen) { >> >> + skb_pull(skb, rlen); >> >> + memmove(skb_network_header(skb) + rlen, >> >> skb_network_header(skb), >> >> + start); >> >> + skb_postpull_rcsum(skb, skb_network_header(skb), rlen); >> >> + >> >> + skb_reset_network_header(skb); >> >> + skb_set_transport_header(skb, sizeof(struct ipv6hdr)); >> >> + >> >> + pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen; >> >> + ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len); >> >> + >> >> + skb_transport_header(skb)[1] -= rlen >> 3; >> >> + end -= rlen; >> >> + } >> >> + >> >> + if (padlen) { >> >> + off = end - padlen; >> >> + h = skb_network_header(skb); >> >> + >> >> + if (padlen == 1) { >> >> + h[off] = IPV6_TLV_PAD1; >> >> + } else { >> >> + padlen -= 2; >> >> + >> >> + h[off] = IPV6_TLV_PADN; >> >> + h[off + 1] = padlen; >> >> + memset(&h[off + 2], 0, padlen); >> >> + } >> >> + } >> >> + >> >> + return end; >> >> +} >> >> + >> >> /* Parse tlv encoded option header (hop-by-hop or destination) */ >> >> >> >> static bool ip6_parse_tlv(const struct tlvtype_proc *procs, >> >> struct sk_buff *skb, >> >> - int max_count) >> >> + int max_count, >> >> + bool removable) >> >> { >> >> int len = (skb_transport_header(skb)[1] + 1) << 3; >> >> - const unsigned char *nh = skb_network_header(skb); >> >> + unsigned char *nh = skb_network_header(skb); >> >> int off = skb_network_header_len(skb); >> >> const struct tlvtype_proc *curr; >> >> bool disallow_unknowns = false; >> >> + int off_remove = 0; >> >> int tlv_count = 0; >> >> int padlen = 0; >> >> + int ret; >> >> >> >> if (unlikely(max_count < 0)) { >> >> disallow_unknowns = true; >> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc >> >> *procs, >> >> if (tlv_count > max_count) >> >> goto bad; >> >> >> >> + ret = -1; >> >> for (curr = procs; curr->type >= 0; curr++) { >> >> if (curr->type == nh[off]) { >> >> /* type specific length/alignment >> >> checks will be performed in the >> >> func(). */ >> >> - if (curr->func(skb, off) == false) >> >> + ret = curr->func(skb, off); >> >> + if (ret == TLV_REJECT) >> >> return false; >> >> break; >> >> } >> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc >> >> *procs, >> >> !ip6_tlvopt_unknown(skb, off, >> >> disallow_unknowns)) >> >> return false; >> >> >> >> + if (removable) { >> >> + if (ret == TLV_REMOVE) { >> >> + if (!off_remove) >> >> + off_remove = off - padlen; >> >> + } else if (off_remove) { >> >> + off = remove_tlv(off_remove, off, >> >> skb); >> >> + nh = skb_network_header(skb); >> >> + off_remove = 0; >> >> + } >> >> + } >> >> + >> >> padlen = 0; >> >> break; >> >> } >> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc >> >> *procs, >> >> len -= optlen; >> >> } >> >> >> >> - if (len == 0) >> >> + if (len == 0) { >> >> + /* Don't forget last TLV if it must be removed */ >> >> + if (off_remove) >> >> + remove_tlv(off_remove, off, skb); >> >> + >> >> return true; >> >> + } >> >> bad: >> >> kfree_skb(skb); >> >> return false; >> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc >> >> *procs, >> >> *****************************/ >> >> >> >> #if IS_ENABLED(CONFIG_IPV6_MIP6) >> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff) >> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff) >> >> { >> >> struct ipv6_destopt_hao *hao; >> >> struct inet6_skb_parm *opt = IP6CB(skb); >> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int >> >> optoff) >> >> if (skb->tstamp == 0) >> >> __net_timestamp(skb); >> >> >> >> - return true; >> >> + return TLV_ACCEPT; >> >> >> >> discard: >> >> kfree_skb(skb); >> >> - return false; >> >> + return TLV_REJECT; >> >> } >> >> #endif >> >> >> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb) >> >> #endif >> >> >> >> if (ip6_parse_tlv(tlvprocdestopt_lst, skb, >> >> - init_net.ipv6.sysctl.max_dst_opts_cnt)) { >> >> + init_net.ipv6.sysctl.max_dst_opts_cnt, >> >> + false)) { >> >> skb->transport_header += extlen; >> >> opt = IP6CB(skb); >> >> #if IS_ENABLED(CONFIG_IPV6_MIP6) >> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct >> >> sk_buff >> >> *skb) >> >> >> >> /* Router Alert as of RFC 2711 */ >> >> >> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff) >> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff) >> >> { >> >> const unsigned char *nh = skb_network_header(skb); >> >> >> >> if (nh[optoff + 1] == 2) { >> >> IP6CB(skb)->flags |= IP6SKB_ROUTERALERT; >> >> memcpy(&IP6CB(skb)->ra, nh + optoff + 2, >> >> sizeof(IP6CB(skb)->ra)); >> >> - return true; >> >> + return TLV_ACCEPT; >> >> } >> >> net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n", >> >> nh[optoff + 1]); >> >> kfree_skb(skb); >> >> - return false; >> >> + return TLV_REJECT; >> >> } >> >> >> >> /* Jumbo payload */ >> >> >> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff) >> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff) >> >> { >> >> const unsigned char *nh = skb_network_header(skb); >> >> struct inet6_dev *idev = __in6_dev_get_safely(skb->dev); >> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int >> >> optoff) >> >> if (pkt_len <= IPV6_MAXPLEN) { >> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS); >> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2); >> >> - return false; >> >> + return TLV_REJECT; >> >> } >> >> if (ipv6_hdr(skb)->payload_len) { >> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS); >> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff); >> >> - return false; >> >> + return TLV_REJECT; >> >> } >> >> >> >> if (pkt_len > skb->len - sizeof(struct ipv6hdr)) { >> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int >> >> optoff) >> >> goto drop; >> >> >> >> IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM; >> >> - return true; >> >> + return TLV_ACCEPT; >> >> >> >> drop: >> >> kfree_skb(skb); >> >> - return false; >> >> + return TLV_REJECT; >> >> } >> >> >> >> /* CALIPSO RFC 5570 */ >> >> >> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff) >> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff) >> >> { >> >> const unsigned char *nh = skb_network_header(skb); >> >> >> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, >> >> int >> >> optoff) >> >> if (!calipso_validate(skb, nh + optoff)) >> >> goto drop; >> >> >> >> - return true; >> >> + return TLV_ACCEPT; >> >> >> >> drop: >> >> kfree_skb(skb); >> >> - return false; >> >> + return TLV_REJECT; >> >> } >> >> >> >> static const struct tlvtype_proc tlvprochopopt_lst[] = { >> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb) >> >> >> >> opt->flags |= IP6SKB_HOPBYHOP; >> >> if (ip6_parse_tlv(tlvprochopopt_lst, skb, >> >> - init_net.ipv6.sysctl.max_hbh_opts_cnt)) { >> >> + init_net.ipv6.sysctl.max_hbh_opts_cnt, >> >> + true)) { >> >> + /* we need to refresh the length in case >> >> + * at least one TLV was removed >> >> + */ >> >> + extlen = (skb_transport_header(skb)[1] + 1) << 3; >> >> skb->transport_header += extlen; >> >> opt = IP6CB(skb); >> >> opt->nhoff = sizeof(struct ipv6hdr); >> >> -- > > >> 2.17.1