Hi All,

I have tried searching the mailing lists but did not seem to find the
answer to the issue that I am seeing. I apologize for the long email,
but more info is better than less :)

For testing I have two OpenBSD 4.7-stable firewalls, and each has a
PCIe quad port Intel PRO/1000 QP (82571EB). Also, test host 0 has two
bge* interfaces (Broadcom BCM5722) and test host 1 has two re*
interfaces (Realtek 8169).

For the purpose of this test I took two fresh OpenBSD 4.7 installs and
configured one interface (test0:bge0 and test1:re0) for the ssh
connections on each machine, all non essential services were turned
off as shown in the ps output below:
------------------------------------------
# ps aux
USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
root         1  0.0  0.0   416   356 ??  Ss    12:12PM    0:00.00 /sbin/init
root     18759  0.0  0.0   524   748 ??  Ss    12:12PM    0:00.01 syslogd: [pri
_syslogd 15917  0.0  0.0   544   768 ??  S     12:12PM    0:00.01 syslogd -a /v
root     16578  0.0  0.1   592  1364 ??  Ss    12:12PM    0:00.00 /usr/sbin/ssh
root      9357  0.0  0.0   528   836 ??  Ss    12:12PM    0:00.01 inetd
root     22666  0.0  0.0   688   864 ??  Ss    12:12PM    0:00.00 cron
root      8106  0.0  0.1  1116  1640 ??  Ss    12:12PM    0:00.00 sendmail: acc
root     20452  0.0  0.1  3356  3072 ??  Ss    12:12PM    0:00.70 sshd: r...@tt
root      7302  0.0  0.0   552   528 p0  Ss    12:12PM    0:00.01 -ksh (ksh)
root     14924  0.0  0.0   372   240 p0  R+/1  12:12PM    0:00.00 ps -aux
root     23870  0.0  0.0   264   848 C0  Ss+   12:12PM    0:00.00 /usr/libexec/
root     14819  0.0  0.0   332   848 C1  Ss+   12:12PM    0:00.00 /usr/libexec/
root     15286  0.0  0.0   264   852 C2  Ss+   12:12PM    0:00.00 /usr/libexec/
root     29182  0.0  0.0   308   860 C3  Ss+   12:12PM    0:00.00 /usr/libexec/
root      2829  0.0  0.0   416   848 C5  Ss+   12:12PM    0:00.00 /usr/libexec/
------------------------------------------

For the tests performed all switches were unmanaged Netgear JFS516 and
JGS516. The hosts we wired as follows:
------------------------------------------
test00:em2  ----- 1:sw0a:2 ----- 1:sw0c
test01:em2  ----- 1:sw0b:2 ----- 2:sw0c

test00:em3  ----- 1:sw1a:2 ----- 1:sw1c
test01:em3  ----- 1:sw1b:2 ----- 2:sw1c

test00:bge1 ----- 1:sw2a:2 ----- 1:sw2c
test01:re1  ----- 1:sw2b:2 ----- 2:sw2c
------------------------------------------
For example, the last line shows that re1 on test01 was connected to
port 1 on switch sw2b and port 2 from sw2b was connected to port 2 on
switch sw2c. (I would have loved to draw an ASCII diagram but it got
too complex.) The reason for so many switches is to approximate
different failure scenarios.

On host test00 the interfaces were configures as follows:
------------------------------------------
# ifconfig em2 up -inet6
# ifconfig em3 up -inet6
# ifconfig bge1 up -inet6
# ifconfig carp0 -inet6 vhid 1 carpdev em2 192.168.1.1 netmask 255.255.255.0
# ifconfig carp1 -inet6 vhid 2 carpdev em3 192.168.2.1 netmask 255.255.255.0
# ifconfig carp2 -inet6 vhid 3 carpdev bge1 192.168.3.1 netmask 255.255.255.0
# ifconfig carp
carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:01
        priority: 0
        carp: MASTER carpdev em2 vhid 1 advbase 1 advskew 0
        groups: carp
        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
carp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:02
        priority: 0
        carp: MASTER carpdev em3 vhid 2 advbase 1 advskew 0
        groups: carp
        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
carp2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:03
        priority: 0
        carp: MASTER carpdev bge1 vhid 3 advbase 1 advskew 0
        groups: carp
        inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255
------------------------------------------

On host test01 the interfaces were configures as follows:
------------------------------------------
# ifconfig em2 up -inet6
# ifconfig em3 up -inet6
# ifconfig re1 up -inet6
# ifconfig carp0 -inet6 vhid 1 advskew 100 carpdev em2 192.168.1.1
netmask 255.255.255.0
# ifconfig carp1 -inet6 vhid 2 advskew 100 carpdev em3 192.168.2.1
netmask 255.255.255.0
# ifconfig carp2 -inet6 vhid 3 advskew 100 carpdev re1 192.168.3.1
netmask 255.255.255.0
# ifconfig carp
carp0: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1500
        lladdr 00:00:5e:00:01:01
        priority: 0
        carp: BACKUP carpdev em2 vhid 1 advbase 1 advskew 100
        groups: carp
        status: backup
        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
carp1: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1500
        lladdr 00:00:5e:00:01:02
        priority: 0
        carp: BACKUP carpdev em3 vhid 2 advbase 1 advskew 100
        groups: carp
        status: backup
        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
carp2: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1500
        lladdr 00:00:5e:00:01:03
        priority: 0
        carp: BACKUP carpdev re1 vhid 3 advbase 1 advskew 100
        groups: carp
        status: backup
        inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255
------------------------------------------

On both machines the following sysctl changes were made:
------------------------------------------
net.inet.carp.preempt=1
net.inet.carp.log=7
------------------------------------------


For the first test unplug the cable between sw0a:2 and sw0c:1, and as
expected the log on test01 shows:
------------------------------------------
test01 /bsd: carp0: state transition: BACKUP -> MASTER

# ifconfig carp
carp0: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1500
        lladdr 00:00:5e:00:01:01
        priority: 0
        carp: MASTER carpdev em2 vhid 1 advbase 1 advskew 100
        groups: carp
        status: master
        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
carp1: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1500
        lladdr 00:00:5e:00:01:02
        priority: 0
        carp: BACKUP carpdev em3 vhid 2 advbase 1 advskew 100
        groups: carp
        status: backup
        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
carp2: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1500
        lladdr 00:00:5e:00:01:03
        priority: 0
        carp: BACKUP carpdev re1 vhid 3 advbase 1 advskew 100
        groups: carp
        status: backup
        inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255
------------------------------------------

And carp0 on em2 became the master while nothing changed on test00.
Plugging the cable back yields the expected result and the log on
test01 shows:
------------------------------------------
test01 /bsd: carp0: state transition: MASTER -> BACKUP

# ifconfig carp
carp0: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1500
        lladdr 00:00:5e:00:01:01
        priority: 0
        carp: BACKUP carpdev em2 vhid 1 advbase 1 advskew 100
        groups: carp
        status: backup
        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
carp1: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1500
        lladdr 00:00:5e:00:01:02
        priority: 0
        carp: BACKUP carpdev em3 vhid 2 advbase 1 advskew 100
        groups: carp
        status: backup
        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
carp2: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1500
        lladdr 00:00:5e:00:01:03
        priority: 0
        carp: BACKUP carpdev re1 vhid 3 advbase 1 advskew 100
        groups: carp
        status: backup
        inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255
------------------------------------------

For the second test unplug the cable between test00:em2 and sw0a:1.
Now the results are not what I have expected. The log on test00 shows
the following:
------------------------------------------
test00 /bsd: carp0: state transition: MASTER -> INIT
test00 /bsd: carp: carp0 demoted group carp to 1
test00 /bsd: carp1: state transition: MASTER -> BACKUP
test00 /bsd: carp2: state transition: MASTER -> BACKUP

# ifconfig carp
carp0: flags=8803<UP,BROADCAST,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:01
        priority: 0
        carp: INIT carpdev em2 vhid 1 advbase 1 advskew 0
        groups: carp
        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
carp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:02
        priority: 0
        carp: BACKUP carpdev em3 vhid 2 advbase 1 advskew 0
        groups: carp
        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
carp2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:03
        priority: 0
        carp: BACKUP carpdev bge1 vhid 3 advbase 1 advskew 0
        groups: carp
        inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255
# ifconfig em2
em2: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
        lladdr 00:15:17:b8:db:3e
        priority: 0
        media: Ethernet autoselect (none)
        status: no carrier
------------------------------------------

And the log on test01 shows the following:
------------------------------------------
test01 /bsd: carp1: state transition: BACKUP -> MASTER
test01 /bsd: carp2: state transition: BACKUP -> MASTER
test01 /bsd: carp0: state transition: BACKUP -> MASTER
------------------------------------------

Plugging the cable back in brings the system back to the original
state as shown in the logs below:
------------------------------------------
test00 /bsd: carp0: state transition: INIT -> BACKUP
test00 /bsd: carp: carp0 demoted group carp to 0
test00 /bsd: carp0: state transition: BACKUP -> MASTER
test00 /bsd: carp1: state transition: BACKUP -> MASTER
test00 /bsd: carp2: state transition: BACKUP -> MASTER


test01 /bsd: carp0: state transition: MASTER -> BACKUP
test01 /bsd: carp1: state transition: MASTER -> BACKUP
test01 /bsd: carp2: state transition: MASTER -> BACKUP
------------------------------------------

The same behavior can be reproduced with any of the other interfaces.
Swapping the roles of both machines yields the same result. Repeating
the test on the 4.8-current branch yields the same result as well.

Based on the above examples, what is the reason that the behavior of
carpdev is different between the two tests? Physically, the only
difference as seen by the host test00 in the second test is that the
underlaying interface em2 of carp0 changes status from 'active' to 'no
carrier'. Is this behavior expected? Or should the second test behave
as the first one? What is the reason for carpdev to demote the entire
carp group on test00?

If this seems like a bug I would be more than happy to assist with testing.

Any comments are greatly appreciated it.

Thanks!

--peter

Reply via email to