On 30/03/16(Wed) 18:04, Florian Riehm wrote: > On 03/01/16 23:03, Martin Pieuchot wrote: > > On 18/02/16(Thu) 16:46, Florian Riehm wrote: > >> On 02/16/16 11:23, Martin Pieuchot wrote: > >>> On 12/02/16(Fri) 16:33, Florian Riehm wrote: > >>>> Hi Tech, > >>>> > >>>> I have noticed that CARP IP-Balancing is broken, so I am testing and > >>>> fixing it. > >>>> > >>>> The first problem came in with this commit: > >>>> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_ethersubr.c.diff?r1=1.176&r2=1.177 > >>>> It enforces that outgoing packets use the src mac of the carp interface > >>>> instead > >>>> the mac of its carpdev. That's a good idea for failover carp but it > >>>> breaks > >>>> balancing ip(-stealth), because the same source MAC is seen by multiple > >>>> switchports now. That leads to continues MAC flapping at switches. > >>>> My attached patch fixes the problem. Thanks Martin's refactoring, we can > >>>> enforce the right MAC on a central place in carp_start() now. > >>> > >>> Can't you completely get rid of the else chunk then? It looks like a > >>> bad refactoring. > >> > >> I was also thinking about deleting the else chunk. At the end I decided to > >> keep > >> it, because I wasn't sure about its purpose. > >> My intention was: > >> Fix the more or less obviously broken balancing case and don't change the > >> behavior of other CARP modes. > >> I will remove the else chunk and do some more testing to determine if it's > >> really unnecessary. > >> > >>> I'm also wondering can't we use "sc_realmac" here? > >> > >> I have not know "sc_realmac" until now. As far as I understand the > >> intention of > >> the sc_realmac flag is, to indicate that the user has overwritten the mac > >> address of the carp interface with the mac of its parent. I have never had > >> a > >> use case to do this. Without overwriting the mac by hand sc_realmac is 0 > >> for > >> carp with balancing ip/ip-stealth AND carp without balancing. > >> > >> I don't see how sc_realmac could help us by this problem?! > >> Can you give me a hint please? > > > > It was an open question, if that does not make any sense, then don't do > > it ;) > > > >>>> A second problem was introduced two weeks ago. Since we always check the > >>>> destination MAC address of received unicast packets, not only when in > >>>> promiscuous mode, carp balancing ip is always broken, not only when in > >>>> promiscuous mode. The new check collides with "Clear mcast if received > >>>> on a > >>>> carp IP balanced address." in carp_input(). > >>> > >>> Could you describe the problem? Which Destination MAC the packet > >>> contains and why it doesn't match the interface? I still don't grok why > >>> a carp(4) interface in balancing mode cannot have the right MAC > >>> configured... Do you know? But if what you want is the parent's MAC, > >>> then I'd rather go with the symmetric of your other change: modify > >>> carp_input() to overwrite ``ether_dhost''. > >> > >> If using carp balancing, incoming packets contain the destination mac > >> address > >> of the carp interface, let's say: 01:00:5e:00:01:01. > >> The MCAST-Bit is set here! > >> > >> carp_input() removes the MCAST-Bit: > >> /* > >> * Clear mcast if received on a carp IP balanced address. > >> */ > >> if (sc->sc_balancing == CARP_BAL_IP && > >> ETHER_IS_MULTICAST(eh->ether_dhost)) > >> *(eh->ether_dhost) &= ~0x01; > >> > >> Then we run into ether_input() and > >> memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN) > >> evaluates to false because > >> ac->ac_enaddr (01:00:5e:00:01:01) != eh->ether_dhost > >> (00:00:5e:00:01:01) > > > > Taking a step back why do you want to clear the MULTICAST bit? > > > > If it is just to differentiate between advertisement and normal traffic > > couldn't we use a mbuf_tag(9) for CARP advertisement instead? The tag > > could be set in carp_input() and checked in carp_lsdrop() in the input > > path. It could also be used in the output path to get rid of the > > horrible ``cur_vhe'' hack. > > > Thanks for your comments. I took a look to the big picture: > The idea behind carp balancing is that clients send all their packets to > layer 2 > multicast addresses. Switches flood those packets to all ports and every > CARP-System receives them. Only one system processes the packet, the other > systems drop it. The processing system handles the packet like a normal > unicast > packet. The packet distribution described above is the only reason why the > multicast bit is required. We don't want further multicast handling. > > What happens if we don't remove the MCAST bit inside carp_input()? > In ether_input() packets with MCAST bit get the mbuf flag M_MCAST. From now > the > kernel assumes to deal with real multicast packets. Upper layers will do many > things we don't want, e.g. tcp_input() and icmp_input_if() would drop the > packet. > > What can we do? > 1.) Introduce a new mbuf flag to mark the special carp MCAST packets and > handle > them properly. > => Bad idea, because we have no unused bits inside m_flags. It would also > spread > the CARP hacks to more places - that's not what we want.
Well if you remove the flag inside carp_lsdrop() that could work. My initial idea was to mark advertisements with a mbuf_tag(9) and keep the M_MCAST bit on the packets until carp_lsdrop() is called. There if the packet has a mbuf_tag() pass it, otherwise if we have a match remove the M_MCAST flag. > 2.) Remove the multicast bit of the carp interface's mac address to make the > check "memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN)" in > ether_input() passing. This means: > - Carp-Interfaces would have lladdr like 00:00:5e:00:01:01 instead of > 01:00:5e:00:01:01 (Same as in balancing-stealth-Mode) > - We have to add the MCAST-Bit to outgoing arp-replies to make sure that > clients > sending their packets to layer-2 multicast addresses. > - Keep removing the MCAST-Bit inside carp_input() > => Could work, but it looks like 'move the hack to an other place'. Actually > it's not that bad because we already have carp logic inside in_arpinput(). > > What do you think about solution 2? I would submit a diff if you like the > idea. It might be even better. You could simply set the bit in carp_start(). Now from a performance point of view if we could prevent rewriting headers in carp_start() that would be a win. My concern is to keep the hack in the carp driver. I want carp(4) to be a pseudo-interface like any others as much as possible. Having fewer #ifdef would also reduce the chance of future breakages. Martin