On Tue, Oct 6, 2020 at 5:49 PM Numan Siddique <nusid...@redhat.com> wrote: > > On Tue, Oct 6, 2020 at 4:46 PM Florian Westphal <f...@strlen.de> wrote: > > > > nusid...@redhat.com <nusid...@redhat.com> wrote: > > > From: Numan Siddique <nusid...@redhat.com> > > > > > > For a tcp packet which is part of an existing committed connection, > > > nf_conntrack_in() will return err and set skb->_nfct to NULL if it is > > > out of tcp window. ct action for this packet will set the ct_state > > > to +inv which is as expected. > > > > This is because from conntrack p.o.v., such TCP packet is NOT part of > > the existing connection. > > > > For example, because it is considered part of a previous incarnation > > of the same connection. > > > > > But a controller cannot add an OVS flow as > > > > > > table=21,priority=100,ct_state=+inv, actions=drop > > > > > > to drop such packets. That is because when ct action is executed on other > > > packets which are not part of existing committed connections, ct_state > > > can be set to invalid. Few such cases are: > > > - ICMP reply packets. > > > > Can you elaborate? Echo reply should not be invalid. Conntrack should > > mark it as established (unless such echo reply came out of the blue). > > Hi Florian, > > Thanks for providing the comments. > > Sorry for not being very clear. > > Let me brief about the present problem we see in OVN (which is a > controller using ovs) > > When a VM/container sends a packet (in the ingress direction), we don't send > all > the packets to conntrack. If a packet is destined to an OVN load > balancer virtual ip, > only then we send the packet to conntrack in the ingress direction and > then we do dnat > to the backend. > > Eg. in the ingress direction > > table=1, match = (ip && ip4.dst == VIP) action = ct(table=2) > tablle=2, ct_state=+new+trk && ip4.dst == VIP, action = ct(commit, > nat=BACKEND_IP) > ... > .. > > However for the egress direction (when the packet is to be delivered > to the VM/container), > we send all the packets to conntrack and if the ct.est is set, we do > undnat before delivering > the packet to the VM/container. > ... > table=40, match = ip, action = ct(table=41) > table=41, match = ct_state=+est+trk, action = ct(nat) > ... > > What I mean here is that, since we send all the packets in the egress > pipeline to conntrack, > we can't add a flow like - match = ct_state=+inv, action = drop. > > i.e When a VM/container sends an ICMP request packet, it will not be > sent to conntrack, but > the reply ICMP will be sent to conntrack and it will be marked as invalid. > > So is the case with TCP, the TCP SYN from the VM is not sent to > conntrack, but the SYN/ACK > from the server would be sent to conntrack and it will be marked as invalid. > > > > > > - TCP SYN/ACK packets during connection establishment. > > > > SYN/ACK should also be established state. > > INVALID should only be matched for packets that were never seen > > by conntrack, or that are deemed out of date / corrupted. > > > > > To distinguish between an invalid packet part of committed connection > > > and others, this patch introduces as a new ct attribute > > > OVS_CT_ATTR_LOOKUP_INV. If this is set in the ct action (without commit), > > > it tries to find the ct entry and if present, sets the ct_state to > > > +inv,+trk and also sets the mark and labels associated with the > > > connection. > > > > > > With this, a controller can add flows like > > > > > > .... > > > .... > > > table=20,ip, action=ct(table=21, lookup_invalid) > > > table=21,priority=100,ct_state=+inv+trk,ct_label=0x2/0x2 actions=drop > > > table=21,ip, actions=resubmit(,22) > > > .... > > > .... > > > > What exactly is the feature/problem that needs to be solved? > > I suspect this would help me to provide better feedback than the > > semi-random comments below .... :-) > > > > My only problem with how conntrack does things ATM is that the ruleset > > cannot distinguish: > > > > 1. packet was not even seen by conntrack > > 2. packet matches existing connection, but is "bad", for example: > > - contradicting tcp flags > > - out of window > > - invalid checksum > > We want the below to be solved (using OVS flows) : > - If the packet is marked as invalid due to (2) which you mentioned above, > we would like to read the ct_mark and ct_label fields as the packet is > part of existing connection, so that we can add an OVS flow like > > ct_state=+inv+trk,ct_label=0x2 actions=drop > > Right now it is not possible.
I forgot to mention the side effect of it. Since the tcp out of window packet is set as +inv, this packet is delivered to the VM/container without undnat and because of this VM/container resets the connection. Thanks Numan > > This patch does another lookup if skb->_nfct is NULL after > nf_conntrack_in() to check > if (2) is the case. If the lookup is successful, it updates the ct flow > key with the ct_mark and ct_label. This is made optional using a > netlink attribute. > > I'm not sure if it's possible for nf_conntrack_in() to provide this > information for > its callers so that the caller can come to know that the state is > invalid because of (2). > > I tested by setting 'be_liberal' sysctl flag and since skb->_nfct was > set for (2), OVS > datapath module set the ct_state to +est. > > Thanks > Numan > > > > > > There are a few sysctls to modify default behaviour, e.g. relax window > > checks, or ignore/skip checksum validation. > > > > The other problem i see (solveable for sure by yet-another-sysctl but i > > see that as last-resort) is usual compatibility problem: > > > > ct state invalid drop > > ct mark gt 0 accept > > > > If standard netfilter conntrack were to set skb->_nfct e.g. even if > > state is invalid, we could still make the above work via some internal > > flag. > > > > But if you reverse it, you get different behaviour: > > > > ct mark gt 0 accept > > ct state invalid drop > > > > First rule might now accept out-of-window packet even when "be_liberal" > > sysctl is off. > >