I have 2 firewalls setup running OpenBSD 5.1 amd64.  I have 4 nics on each
box.  The nics are paired off into interface failover trunks.  I then have
4 vlans configured on each box.  3 Vlans go over trunk0, one goes over
trunk 1.  I have carp setup on each box as well.  I have a carp interface
set up for each vlan.  On FW2, I have an advskew of 128 configured so that
this box will act as the backup for all carp devices.  I also have pfsync
and sasyncd running as well.

When I first set this up, I had some odd behavior when I booted both
machines.  Sometimes fw1 would come up as master for everything, sometimes
both fw1 and fw2 would come up as master for everything, and sometimes it
would be a mix.  I noticed that the carp demote counter on each box would
be a different number each time I rebooted the box.  It would be anywhere
from 0 to 126.  I looked at the /etc/rc script to see where the demote
counter is being jacked up to 128 while various networking interfaces are
being started.  I put a 'sleep 20' right after '. /etc/netstart' in the
file and that seemed to allow carpdemote to consistently come down to 0 as
the machine finished booting.

This seemed to fix my problems, or so I thought.  Today I noticed that my
FW1 had crashed for some reason (still investigating).  FW2 assumed master
of all carp devices, as it should.  I rebooted FW1 and it came up.  I
checked the ifconfig status for the carp devices and for one carp device,
it was backup, while the other 3 it was master.  I checked on FW2 and FW2
was master for all 4 carp devices.  This doesn't seem correct as now there
are two machines advertising master for 3 of the 4 carp devices.  I also
thought I had it setup so that one box would be carp master for all or none
of the carp groups, not a mix.  I have net.inet.carp.preempt=1 set in
sysctl.

Basically, I need to figure out why carp is not behaving correctly, or at
least what my understanding of correct is.  I'm happy to post any configs
required, however I am currently not at a machine that can access the
systems in question so that is why they aren't included in this email.

Reply via email to