On 03/31/16 10:14, Martin Pieuchot wrote:
> On 30/03/16(Wed) 18:04, Florian Riehm wrote:
>> On 03/01/16 23:03, Martin Pieuchot wrote:
>>> On 18/02/16(Thu) 16:46, Florian Riehm wrote:
>>>> On 02/16/16 11:23, Martin Pieuchot wrote:
>>>>> On 12/02/16(Fri) 16:33, Florian Riehm wrote:
>>>>>> Hi Tech,
>>>>>>
>>>>>> I have noticed that CARP IP-Balancing is broken, so I am testing and
>>>>>> fixing it.
>>>>>>
>>>>>> The first problem came in with this commit:
>>>>>> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_ethersubr.c.diff?r1=1.176&r2=1.177
>>>>>> It enforces that outgoing packets use the src mac of the carp interface 
>>>>>> instead
>>>>>> the mac of its carpdev. That's a good idea for failover carp but it 
>>>>>> breaks
>>>>>> balancing ip(-stealth), because the same source MAC is seen by multiple
>>>>>> switchports now. That leads to continues MAC flapping at switches.
>>>>>> My attached patch fixes the problem. Thanks Martin's refactoring, we can
>>>>>> enforce the right MAC on a central place in carp_start() now.
>>>>>
>>>>> Can't you completely get rid of the else chunk then?  It looks like a
>>>>> bad refactoring.
>>>>
>>>> I was also thinking about deleting the else chunk. At the end I decided to 
>>>> keep
>>>> it, because I wasn't sure about its purpose.
>>>> My intention was:
>>>> Fix the more or less obviously broken balancing case and don't change the
>>>> behavior of other CARP modes.
>>>> I will remove the else chunk and do some more testing to determine if it's
>>>> really unnecessary.
>>>>
>>>>> I'm also wondering can't we use "sc_realmac" here?
>>>>
>>>> I have not know "sc_realmac" until now. As far as I understand the 
>>>> intention of
>>>> the sc_realmac flag is, to indicate that the user has overwritten the mac
>>>> address of the carp interface with the mac of its parent. I have never had 
>>>> a
>>>> use case to do this. Without overwriting the mac by hand sc_realmac is 0 
>>>> for
>>>> carp with balancing ip/ip-stealth AND carp without balancing.
>>>>
>>>> I don't see how sc_realmac could help us by this problem?!
>>>> Can you give me a hint please?
>>>
>>> It was an open question, if that does not make any sense, then don't do
>>> it ;)
>>>
>>>>>> A second problem was introduced two weeks ago. Since we always check the
>>>>>> destination MAC address of received unicast packets, not only when in
>>>>>> promiscuous mode, carp balancing ip is always broken, not only when in
>>>>>> promiscuous mode. The new check collides with "Clear mcast if received 
>>>>>> on a
>>>>>> carp IP balanced address." in carp_input().
>>>>>
>>>>> Could you describe the problem?  Which Destination MAC the packet
>>>>> contains and why it doesn't match the interface?  I still don't grok why
>>>>> a carp(4) interface in balancing mode cannot have the right MAC
>>>>> configured...  Do you know?  But if what you want is the parent's MAC,
>>>>> then I'd rather go with the symmetric of your other change:  modify
>>>>> carp_input() to overwrite ``ether_dhost''.
>>>>
>>>> If using carp balancing, incoming packets contain the destination mac 
>>>> address
>>>> of the carp interface, let's say: 01:00:5e:00:01:01.
>>>> The MCAST-Bit is set here!
>>>>
>>>> carp_input() removes the MCAST-Bit:
>>>>    /*
>>>>     * Clear mcast if received on a carp IP balanced address.
>>>>     */
>>>>    if (sc->sc_balancing == CARP_BAL_IP &&
>>>>        ETHER_IS_MULTICAST(eh->ether_dhost))
>>>>            *(eh->ether_dhost) &= ~0x01;
>>>>
>>>> Then we run into ether_input() and 
>>>>     memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN)
>>>> evaluates to false because 
>>>>     ac->ac_enaddr (01:00:5e:00:01:01) != eh->ether_dhost 
>>>> (00:00:5e:00:01:01)
>>>
>>> Taking a step back why do you want to clear the MULTICAST bit?
>>>
>>> If it is just to differentiate between advertisement and normal traffic
>>> couldn't we use a mbuf_tag(9) for CARP advertisement instead?  The tag
>>> could be set in carp_input() and checked in carp_lsdrop() in the input
>>> path.  It could also be used in the output path to get rid of the
>>> horrible ``cur_vhe'' hack.
>>>
>> Thanks for your comments. I took a look to the big picture:
>> The idea behind carp balancing is that clients send all their packets to 
>> layer 2
>> multicast addresses. Switches flood those packets to all ports and every
>> CARP-System receives them. Only one system processes the packet, the other
>> systems drop it. The processing system handles the packet like a normal 
>> unicast
>> packet. The packet distribution described above is the only reason why the
>> multicast bit is required. We don't want further multicast handling.
>>
>> What happens if we don't remove the MCAST bit inside carp_input()?
>> In ether_input() packets with MCAST bit get the mbuf flag M_MCAST. From now 
>> the
>> kernel assumes to deal with real multicast packets. Upper layers will do many
>> things we don't want, e.g. tcp_input() and icmp_input_if() would drop the
>> packet.
>>  
>> What can we do?
>> 1.) Introduce a new mbuf flag to mark the special carp MCAST packets and 
>> handle
>> them properly.
>> => Bad idea, because we have no unused bits inside m_flags. It would also 
>> spread
>> the CARP hacks to more places - that's not what we want.
> 
> Well if you remove the flag inside carp_lsdrop() that could work.
> 
> My initial idea was to mark advertisements with a mbuf_tag(9) and keep
> the M_MCAST bit on the packets until carp_lsdrop() is called.  There if
> the packet has a mbuf_tag() pass it, otherwise if we have a match remove
> the M_MCAST flag.
> 
>> 2.) Remove the multicast bit of the carp interface's mac address to make the
>> check "memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN)" in
>> ether_input() passing. This means:
>> - Carp-Interfaces would have lladdr like 00:00:5e:00:01:01 instead of
>>   01:00:5e:00:01:01 (Same as in balancing-stealth-Mode)
>> - We have to add the MCAST-Bit to outgoing arp-replies to make sure that 
>> clients
>>   sending their packets to layer-2 multicast addresses.
>> - Keep removing the MCAST-Bit inside carp_input()
>> => Could work, but it looks like 'move the hack to an other place'. Actually
>> it's not that bad because we already have carp logic inside in_arpinput().
>>
>> What do you think about solution 2? I would submit a diff if you like the 
>> idea.
> 
> It might be even better.  You could simply set the bit in carp_start().
> Now from a performance point of view if we could prevent rewriting
> headers in carp_start() that would be a win.

If using carp balancing we have to use the physical source mac of the carp
parent interface to avoid duplicate mac addresses.
If using carp failover we have to use the virtual mac address of the carp
interface to share the virtual mac address.
That means the source mac is depending on the carp mode.
In my opinion carp_start() is a good place for the distinction of cases, even if
it's not optimal from the performance point of view. It puts the carp specific
logic to a central place and makes clear why this is necessary. 
In the past we had this logic less obviously implemented in ether_output() and
it was broken certain times. Fixing one problem led to other problems...
So I would prefer a clear, robust solution over a minimal performance gain.

> 
> My concern is to keep the hack in the carp driver.  I want carp(4) to
> be a pseudo-interface like any others as much as possible.  Having fewer
> #ifdef would also reduce the chance of future breakages.

We have two types of multicast packets if using carp balancing:
1.) Carp Advertisements
They are regular multicast frames. That means it's layer 2 and layer 3 
multicast.
The kernel should be able to handle them like normal multicast.
2.) Incoming balanced traffic
This is layer 2 multicast with layer 3 unicast inside. This is special and
requires some attention.
First let's think about the question "Why do we need the multicast bit for
incoming traffic"?
We don't need this Flag! Switches need the flag to distribute the traffic to all
ports. From this point of view it's maybe not that bad to remove the bit in
incoming packets as early as possible, like it's done in carp_input() already.
For the outgoing path the multicast bit is only relevant if we send arp replies.
They have to look like this:
08:00:27:df:cb:d2 08:00:27:65:ad:48 0806 42: arp reply 10.188.1.17 is-at 
01:00:5e:00:01:01
Note that the multicast address is used only inside the arp packet,
NOT inside the ethernet header. This means it can not be added in carp_start()
and we will definitely need CARP specific part while creating arp replies.
I'm working on a proof-of-concept to see if my idea works.

Florian

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to