Re: OpenSSH: cause of random kex_exchange_identification errors?

Vincent Lefevre Tue, 14 Jun 2022 03:50:45 -0700

On 2022-06-07 17:19:12 +0100, Tim Woodall wrote:
> On Tue, 7 Jun 2022, Vincent Lefevre wrote:
> > I eventually did a packet capture on the client side as I was able to
> > reproduce the problem. When it occurs, I get the following sequence:
> > 
> > Client ? Server: [SYN] Seq=0
> > Server ? Client: [SYN, ACK] Seq=0
> > Client ? Server: [ACK] Seq=1
> > Server ? Client: [FIN, ACK] Seq=1
> > Client ? Server: Client: Protocol (SSH-2.0-OpenSSH_9.0p1 Debian-1)
> > Server ? Client: [RST] Seq=2
> > Client ? Server: [FIN, ACK] Seq=33
> > Server ? Client: [RST] Seq=2
> > 
> > So the issue comes from the server, which sends [FIN, ACK] to terminate
> > the connection. In OpenSSH's sshd.c, this could be due to
> > 
> >                        if (unset_nonblock(*newsock) == -1 ||
> >                            drop_connection(*newsock, startups) ||
> >                            pipe(startup_p) == -1) {
> >                                close(*newsock);
> >                                continue;
> >                        }
> > 
> > At least 2 kinds of errors are not logged:
> > 
> > * In unset_nonblock(), a "fcntl(fd, F_SETFL, val) == -1" condition.
> > 
> > * the "pipe(startup_p) == -1" condition.
> > 
> > I'm not sure about drop_connection(), which is related to MaxStartups.
> > 
> 
> I've not seen the start of this thread but is this occasional or always?


Occasional. Someone else at my lab could reproduce the issue.
But the admins can't.

> If occasional, how many concurrent connections do you have starting all
> at once.

I'm not sure what you mean by "concurrent connections". The server
is a SSH gateway, so that many users connect to it. But for the
client host above (my personal machine at my lab), this was the
only connection from this machine; note I did this connection only
for testing, as there is no need to connect to this SSH gateway
from the lab.

> The default ssh config has a super-annoying default that
> randomly kills sessions if too many are handshaking at once.
> 
> It's the MaxStartups setting you allude to. I've been bitten by this
> where cron jobs all start at the same time and ssh to the same host.

MaxStartups was increased in February, after I initially reported
the problem.

Since this is a Debian 10 machine with OpenSSH_7.9p1 Debian-10+deb10u2,
I should have quoted the code from this sshd.c version. Thus the
connection close issue should occur in

        if (unset_nonblock(*newsock) == -1) {
                close(*newsock);
                continue;
        }
        if (drop_connection(startups) == 1) {
                char *laddr = get_local_ipaddr(*newsock);
                char *raddr = get_peer_ipaddr(*newsock);

                verbose("drop connection #%d from [%s]:%d "
                    "on [%s]:%d past MaxStartups", startups,
                    raddr, get_peer_port(*newsock),
                    laddr, get_local_port(*newsock));
                free(laddr);
                free(raddr);
                close(*newsock);
                continue;
        }
        if (pipe(startup_p) == -1) {
                close(*newsock);
                continue;
        }

Now, it appears that verbose() logs at SYSLOG_LEVEL_VERBOSE, and it
is just below the default SYSLOG_LEVEL_INFO, so that nothing would be
logged by default concerning MaxStartups, if I understand correctly.

But the admins changed the log level to some debug one a few days ago,
and debug messages effectively appear, but nothing concerning my case
(I had sent the exact time of the failures to the admins).

BTW, the issue also occurs at night, while there should be very few
connections at handshaking status.

-- 
Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Re: OpenSSH: cause of random kex_exchange_identification errors?

Reply via email to