On 30/03/16(Wed) 18:04, Florian Riehm wrote:
> On 03/01/16 23:03, Martin Pieuchot wrote:
> > On 18/02/16(Thu) 16:46, Florian Riehm wrote:
> >> On 02/16/16 11:23, Martin Pieuchot wrote:
> >>> On 12/02/16(Fri) 16:33, Florian Riehm wrote:
> >>>> Hi Tech,
> >>>>
> >>>> I have noticed that CARP IP-Balancing is broken, so I am testing and
> >>>> fixing it.
> >>>>
> >>>> The first problem came in with this commit:
> >>>> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_ethersubr.c.diff?r1=1.176&r2=1.177
> >>>> It enforces that outgoing packets use the src mac of the carp interface 
> >>>> instead
> >>>> the mac of its carpdev. That's a good idea for failover carp but it 
> >>>> breaks
> >>>> balancing ip(-stealth), because the same source MAC is seen by multiple
> >>>> switchports now. That leads to continues MAC flapping at switches.
> >>>> My attached patch fixes the problem. Thanks Martin's refactoring, we can
> >>>> enforce the right MAC on a central place in carp_start() now.
> >>>
> >>> Can't you completely get rid of the else chunk then?  It looks like a
> >>> bad refactoring.
> >>
> >> I was also thinking about deleting the else chunk. At the end I decided to 
> >> keep
> >> it, because I wasn't sure about its purpose.
> >> My intention was:
> >> Fix the more or less obviously broken balancing case and don't change the
> >> behavior of other CARP modes.
> >> I will remove the else chunk and do some more testing to determine if it's
> >> really unnecessary.
> >>
> >>> I'm also wondering can't we use "sc_realmac" here?
> >>
> >> I have not know "sc_realmac" until now. As far as I understand the 
> >> intention of
> >> the sc_realmac flag is, to indicate that the user has overwritten the mac
> >> address of the carp interface with the mac of its parent. I have never had 
> >> a
> >> use case to do this. Without overwriting the mac by hand sc_realmac is 0 
> >> for
> >> carp with balancing ip/ip-stealth AND carp without balancing.
> >>
> >> I don't see how sc_realmac could help us by this problem?!
> >> Can you give me a hint please?
> > 
> > It was an open question, if that does not make any sense, then don't do
> > it ;)
> > 
> >>>> A second problem was introduced two weeks ago. Since we always check the
> >>>> destination MAC address of received unicast packets, not only when in
> >>>> promiscuous mode, carp balancing ip is always broken, not only when in
> >>>> promiscuous mode. The new check collides with "Clear mcast if received 
> >>>> on a
> >>>> carp IP balanced address." in carp_input().
> >>>
> >>> Could you describe the problem?  Which Destination MAC the packet
> >>> contains and why it doesn't match the interface?  I still don't grok why
> >>> a carp(4) interface in balancing mode cannot have the right MAC
> >>> configured...  Do you know?  But if what you want is the parent's MAC,
> >>> then I'd rather go with the symmetric of your other change:  modify
> >>> carp_input() to overwrite ``ether_dhost''.
> >>
> >> If using carp balancing, incoming packets contain the destination mac 
> >> address
> >> of the carp interface, let's say: 01:00:5e:00:01:01.
> >> The MCAST-Bit is set here!
> >>
> >> carp_input() removes the MCAST-Bit:
> >>    /*
> >>     * Clear mcast if received on a carp IP balanced address.
> >>     */
> >>    if (sc->sc_balancing == CARP_BAL_IP &&
> >>        ETHER_IS_MULTICAST(eh->ether_dhost))
> >>            *(eh->ether_dhost) &= ~0x01;
> >>
> >> Then we run into ether_input() and 
> >>     memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN)
> >> evaluates to false because 
> >>     ac->ac_enaddr (01:00:5e:00:01:01) != eh->ether_dhost 
> >> (00:00:5e:00:01:01)
> > 
> > Taking a step back why do you want to clear the MULTICAST bit?
> > 
> > If it is just to differentiate between advertisement and normal traffic
> > couldn't we use a mbuf_tag(9) for CARP advertisement instead?  The tag
> > could be set in carp_input() and checked in carp_lsdrop() in the input
> > path.  It could also be used in the output path to get rid of the
> > horrible ``cur_vhe'' hack.
> > 
> Thanks for your comments. I took a look to the big picture:
> The idea behind carp balancing is that clients send all their packets to 
> layer 2
> multicast addresses. Switches flood those packets to all ports and every
> CARP-System receives them. Only one system processes the packet, the other
> systems drop it. The processing system handles the packet like a normal 
> unicast
> packet. The packet distribution described above is the only reason why the
> multicast bit is required. We don't want further multicast handling.
> 
> What happens if we don't remove the MCAST bit inside carp_input()?
> In ether_input() packets with MCAST bit get the mbuf flag M_MCAST. From now 
> the
> kernel assumes to deal with real multicast packets. Upper layers will do many
> things we don't want, e.g. tcp_input() and icmp_input_if() would drop the
> packet.
>  
> What can we do?
> 1.) Introduce a new mbuf flag to mark the special carp MCAST packets and 
> handle
> them properly.
> => Bad idea, because we have no unused bits inside m_flags. It would also 
> spread
> the CARP hacks to more places - that's not what we want.

Well if you remove the flag inside carp_lsdrop() that could work.

My initial idea was to mark advertisements with a mbuf_tag(9) and keep
the M_MCAST bit on the packets until carp_lsdrop() is called.  There if
the packet has a mbuf_tag() pass it, otherwise if we have a match remove
the M_MCAST flag.

> 2.) Remove the multicast bit of the carp interface's mac address to make the
> check "memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN)" in
> ether_input() passing. This means:
> - Carp-Interfaces would have lladdr like 00:00:5e:00:01:01 instead of
>   01:00:5e:00:01:01 (Same as in balancing-stealth-Mode)
> - We have to add the MCAST-Bit to outgoing arp-replies to make sure that 
> clients
>   sending their packets to layer-2 multicast addresses.
> - Keep removing the MCAST-Bit inside carp_input()
> => Could work, but it looks like 'move the hack to an other place'. Actually
> it's not that bad because we already have carp logic inside in_arpinput().
> 
> What do you think about solution 2? I would submit a diff if you like the 
> idea.

It might be even better.  You could simply set the bit in carp_start().
Now from a performance point of view if we could prevent rewriting
headers in carp_start() that would be a win.

My concern is to keep the hack in the carp driver.  I want carp(4) to
be a pseudo-interface like any others as much as possible.  Having fewer
#ifdef would also reduce the chance of future breakages.

Martin

Reply via email to