Hi Theo,
On Sun, Oct 29, 2017 at 11:45:54AM -0600, Theo de Raadt wrote:
>
> Yes, on the route socket. It is unreasonable for the kernel to
> maintain an infinite number of route change messages, so about 9 years
> ago we developed this scheme of marking the situation for userland to
> handle. Such a mechanism didn't exist before, because noone had run
> into the concern before -- people weren't turning *BSD systems into
> full-table/high-churn routing systems before our daemons came along.
Thanks for explaining.
>
> > We have changed default sysctl settings for:
> > kern.maxcluster=24576
> > net.inet.ip.ifq.maxlen=4096
> > net.inet6.ip6.ifq.maxlen=1024
> >
> > as from netstat -m we ran out of 2048 mbufs at defaults.
>
> Come on, think for a second. See "ip" and "ip6"? That doesn't grow
> the queue on the routing socket. If anything it probably makes
> your situation worse.
The ip and ip6 were the first things I changed to help drops on
interfaces. That has worked, we have now no dropped traffic. And yes I
know that does not help with the ospf issue.
>
> As for growing the size of the route socket buffer -- it is unclear
> whether that won't make the situation worse. When a desync is
> detected in userland, you will already have read and consumed the full
> queue -- which now has a gap in it, and requires a fresh restart. So
> you are promising to do MORE wasteful work before recovering.
>
> Anyways, there are two circumstances where it happens: route buffer limits,
> or temporary mbuf shortage. I think you've hit the latter.
How can I fix this temporary mbuf shortage? I have been searching how to
detect this. From netstat -m output:
$ netstat -m
956 mbufs in use:
933 mbufs allocated to data
14 mbufs allocated to packet headers
9 mbufs allocated to socket names and addresses
930/13264/24576 mbuf 2048 byte clusters in use (current/peak/max)
0/8/24576 mbuf 4096 byte clusters in use (current/peak/max)
0/8/24576 mbuf 8192 byte clusters in use (current/peak/max)
0/14/24584 mbuf 9216 byte clusters in use (current/peak/max)
0/10/24580 mbuf 12288 byte clusters in use (current/peak/max)
0/8/24576 mbuf 16384 byte clusters in use (current/peak/max)
0/8/24576 mbuf 65536 byte clusters in use (current/peak/max)
3768 Kbytes allocated to network (55% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
We hit max 2048 mbuf clusters so i bumped the kern.maxcluster.
Does anybody know how to attack this issue, I have been searching how to
debug this potential mbuf shortage correctly but apparently went the
wrong way to fix this.
Regards
Robert