On Wed, May 15, 2024 at 09:09:47PM +0200, Erin Shepherd wrote: > It seems absent from the BSDs, but on Linux you can pass the MSG_MORE > flag to send() to override TCP_NODELAY for a specific write
Am I understanding correctly this is a variant on TCP_NOPUSH/TCP_CORK? "more data is coming, dont push the send button yet!" In OpenBGPD, TCP_NODELAY is set on the socket (a socket option available on all platforms, I think?), and then all data is coalesced into sendmsg(), no need for 'corking'. From my limited testing it seems a full routing table should fit in ~ TCP 41,000 packets. BIRD has a code path sk_sendmsg()->sendmsg() called from sk_maybe_write(); but based my limited testing I'm not sure this path is followed in all cases, because I see way more than 41K packets for a full table feed (with TCP_NODELAY enabled). Perhaps there are two separate questions here: - are BGP messages (slightly) delayed because of TCP_NODELAY not being set? (I think yes) - are BGP messages as efficiently coalesced into as few TCP packets as possible? (with TCP_NODELAY set, I am not sure) Kind regards, Job ps. To clarify why I started this thread: last week I fell into the TCP subsystem rabbit hole: why are things the way they are? I started auditing various programs related to my $dayjob and thought it would be good to open a conversation with the BIRD developer community. My goal is not necessarily to get this patch 'as-is' merged, but to learn from and with friendly and respected BGP developers.
