On Thu, Nov 26, 2009 at 03:56:37PM +0100, Henning Brauer wrote:
> * Derek Buttineau <[email protected]> [2009-11-26 15:07]:
> > On 2009-11-25, at 6:23 PM, Henning Brauer wrote:
> >
> > > check ifconfig -g carp on both
> >
> >
> > Right now both are at:
> >
> > carp: carp demote count 0
> >
> > However, I did check that before I rebooted the backup unit and the master
> > was
> > set to
> >
> > carp: carp demote count 1
> >
> > At first I thought that maybe pfsync was keeping the master from reverting
> > while it synced state, but even after 24 hours the master hadn't taken back
> > over from the slave.
>
> the one with the higher demote count always loses, regardless of
> advskew. now finding out which subsytem set the demote count might be
> nintrivial. pfsync is in the game, so is rc, and, depending on
> configuration, various daemons like bgpd and ospfd.
What I have observed on a 4.6 firewall pair:
Thge demote count stays on 1 for a while because the first bulk state
update request times out. Only the subsequent one succeeds. The timeout
is 20s by default, but grows if you have a larger max state number.
The analysis is that the pfsync code triggers a bulk request on
the BSIOCSETPFSYNC ioctl, but at that moment the interface is not yet
up, the SIOCSIFFLAGS is done after that.
This happens if you have a line in hostname.pfsync0 like:
up syncif itf0
This gets rewritten by /etc/netstart, moving the "up" to the end.
A workaround (until dlg@ or somebody else finds a real fix) is to have
a newline after "up", so that two ifconfig commands are issued by
netstart, one to up the interface, and next to set the syncif:
up
syncif itf0
-Otto