Hi all, This patch appears to be a very elegant solution to a thorny subtle problem: what to do when a peer is not accepting new routing information from you?
I've seen in the wild that some crashed BGP implementations continue to be able to generate KEEPALIVE messages, and are able to TCP ACK keepalives you are sending, but won't actually accept anything you send to such a problematic peer. A red flag is when the peer keeps telling you not to send it anything by signalling a TCP Receive Window of 0. The consequences are quite dire: in situations like these you know for a fact that the peer is operating based on stale routing information (you have proof they are not accepting your KEEPALIVEs and more importantly UPDATEs, as those are all stuck in your OutQ). If a peer is not progressing new routing information coming from you, how can we trust it was processing any updates coming from other neighbors? Are the routes the peer told us in the past (what we have in Adj-RIB-In) still valid? Seems unlikely to me... it seems safer to destroy the BGP-4 session, log an error, and generate WITHDRAW messages for all routes pointing towards the broken peer so the network can converge to healthier paths. IDR discussions here https://mailarchive.ietf.org/arch/msg/idr/L9nWFBpW0Tci0c9DGfMoqC1j_sA/ OK job@ Kind regards, Job On Mon, Dec 14, 2020 at 06:45:47PM +0100, Claudio Jeker wrote: > The BGP protocol has a keepalive packet which resets the hold timer when a > packet is received. The problem is this covers only one side of the > transmission. It seems that some BGP implementations fail to process > messages in some cases but still send out KEEPALIVE packets. So bgpd > thinks everything is fine even though no updates where processed by the > other side (including our KEEPALIVE packets). The session is stuck in > limbo and with it some prefixes and routes. > > Because of this I think it makes sense to add a send hold timer that is > reset whenever a write call to the socket is made. If a socket does not > become writable for holdtime seconds (90s by default) then the session is > reset similar to the hold timer expiring because no data was received. > > This send holdtimer is not part of the BGP spec right now but looking at > discussions on the IDR mailing list I assume something like this may be > added at one point. > > I would like to know what other people think and would especially like to > know if this diff causes session resets that should not happen. > > Cheers > -- > :wq Claudio > > > Index: bgpd.h > =================================================================== > RCS file: /cvs/src/usr.sbin/bgpd/bgpd.h,v > retrieving revision 1.405 > diff -u -p -r1.405 bgpd.h > --- bgpd.h 5 Nov 2020 11:52:59 -0000 1.405 > +++ bgpd.h 13 Dec 2020 14:56:47 -0000 > @@ -1467,6 +1467,7 @@ static const char * const timernames[] = > "ConnectRetryTimer", > "KeepaliveTimer", > "HoldTimer", > + "SendHoldTimer", > "IdleHoldTimer", > "IdleHoldResetTimer", > "CarpUndemoteTimer", > Index: session.c > =================================================================== > RCS file: /cvs/src/usr.sbin/bgpd/session.c,v > retrieving revision 1.406 > diff -u -p -r1.406 session.c > --- session.c 11 Dec 2020 12:00:01 -0000 1.406 > +++ session.c 13 Dec 2020 14:52:42 -0000 > @@ -373,6 +373,7 @@ session_main(int debug, int verbose) > if ((pt = timer_nextisdue(&p->timers, now)) != NULL) { > switch (pt->type) { > case Timer_Hold: > + case Timer_HoldSend: > bgp_fsm(p, EVNT_TIMER_HOLDTIME); > break; > case Timer_ConnectRetry: > @@ -597,6 +598,7 @@ bgp_fsm(struct peer *peer, enum session_ > switch (event) { > case EVNT_START: > timer_stop(&peer->timers, Timer_Hold); > + timer_stop(&peer->timers, Timer_HoldSend); > timer_stop(&peer->timers, Timer_Keepalive); > timer_stop(&peer->timers, Timer_IdleHold); > > @@ -875,6 +877,7 @@ change_state(struct peer *peer, enum ses > timer_stop(&peer->timers, Timer_ConnectRetry); > timer_stop(&peer->timers, Timer_Keepalive); > timer_stop(&peer->timers, Timer_Hold); > + timer_stop(&peer->timers, Timer_HoldSend); > timer_stop(&peer->timers, Timer_IdleHold); > timer_stop(&peer->timers, Timer_IdleHoldReset); > session_close_connection(peer); > @@ -923,6 +926,7 @@ change_state(struct peer *peer, enum ses > timer_stop(&peer->timers, Timer_ConnectRetry); > timer_stop(&peer->timers, Timer_Keepalive); > timer_stop(&peer->timers, Timer_Hold); > + timer_stop(&peer->timers, Timer_HoldSend); > timer_stop(&peer->timers, Timer_IdleHold); > timer_stop(&peer->timers, Timer_IdleHoldReset); > session_close_connection(peer); > @@ -1780,6 +1784,8 @@ session_dispatch_msg(struct pollfd *pfd, > return (1); > } > p->stats.last_write = getmonotime(); > + if (p->holdtime > 0) > + timer_set(&p->timers, Timer_HoldSend, p->holdtime); > if (p->throttled && p->wbuf.queued < SESS_MSG_LOW_MARK) { > if (imsg_rde(IMSG_XON, p->conf.id, NULL, 0) == -1) > log_peer_warn(&p->conf, "imsg_compose XON"); > Index: session.h > =================================================================== > RCS file: /cvs/src/usr.sbin/bgpd/session.h,v > retrieving revision 1.148 > diff -u -p -r1.148 session.h > --- session.h 11 Dec 2020 12:00:01 -0000 1.148 > +++ session.h 13 Dec 2020 14:52:17 -0000 > @@ -180,6 +180,7 @@ enum Timer { > Timer_ConnectRetry, > Timer_Keepalive, > Timer_Hold, > + Timer_HoldSend, > Timer_IdleHold, > Timer_IdleHoldReset, > Timer_CarpUndemote, >