[Bug 483928] Re: ssh-keyscan(1) exits prematurely on some non-fatal errors

Bug Watch Updater Fri, 25 Nov 2011 00:11:20 -0800

Launchpad has imported 39 comments from the remote bug at
https://bugzilla.mindrot.org/show_bug.cgi?id=1213.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2006-07-26T04:17:58+00:00 Tryponraj wrote:

Hello All,

Im using OpenSSH 4.3p2 and tyring to scan a list of 40 machines in my 
network with ssh-keyscan utility. I used the following command,

ssh-keyscan -t rsa -f hosts.txt

The man page says that this utility displays the host keys rrespective
of ssh or host is up/down and its working great. But in case if the scan
stops at 30th host due to some protocol problems, the utility exits and
don't display the host keys for remaining machines. I think this is an
expected behaviour, but it would be better to ignore that host continue
till the end or atleast this can be documented specifically in the man
page.

I digged up this problem further and find my results below.

ssh-keyscan ignores the hosts if they are not up or sshd is not running
when used with -f <file> option. But when it encounters any error while
retrieving the host key from the machine which is up and have sshd running,it 
simply exits. This may happen due to transport layer implementation in packet.c 
at packet_read_poll_seqnr() which results in exiting.

My guess is that as packet.c is utilised by all OpenSSH utilities
including ssh-keyscan, we can't make ssh-keyscan to continue with
remaining hosts as specified in -f <files> in case of an error. But I also vote 
for atleast documenting this one.

Detailed debug traces are given below:
--------------------------------------
# ssh-keyscan -vvv -t rsa host.server.com
debug2: fd 3 setting O_NONBLOCK
debug1: no match: mpSSH_0.1.0
# host.server.com SSH-2.0-mpSSH_0.1.0
debug1: Enabling compatibility mode for protocol 2.0
debug3: RNG is ready, skipping seeding
debug1: SSH2_MSG_KEXINIT sent
Received disconnect from 16.245.97.226: 11:  SSH Disabled

# ssh -vvv host.server.com
OpenSSH_4.3p2-hpn, OpenSSL 0.9.7i 14 Oct 2005
HP-UX Secure Shell-A.04.30.005, HP-UX Secure Shell version
debug1: Reading configuration data /opt/ssh/etc/ssh_config
debug3: RNG is ready, skipping seeding
debug2: ssh_connect: needpriv 0
debug1: Connecting to host.server.com [16.245.97.226] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/3
debug1: identity file /.ssh/identity type 0
debug3: Not a RSA1 key file /.ssh/id_rsa.
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug3: key_read: missing keytype
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug2: key_type_from_name: unknown key type '-----END'
debug3: key_read: missing keytype
debug1: identity file /.ssh/id_rsa type 1
debug3: Not a RSA1 key file /.ssh/id_dsa.
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug3: key_read: missing keytype
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug2: key_type_from_name: unknown key type '-----END'
debug3: key_read: missing keytype
debug1: identity file /.ssh/id_dsa type 2
debug1: Remote protocol version 2.0, remote software version mpSSH_0.1.0
debug1: no match: mpSSH_0.1.0
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_4.3p2-hpn
debug2: fd 4 setting O_NONBLOCK
debug3: RNG is ready, skipping seeding
debug1: SSH2_MSG_KEXINIT sent
Received disconnect from 16.245.97.226: 11:  SSH Disabled

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/0

------------------------------------------------------------------------
On 2006-10-01T20:39:30+00:00 Paul Wouters wrote:

I was going to open a new bug report, but I think I am reporting the
same bug as this one.

ssh-keyscan aborts when it encounters glue without the proper
authoritative data. eg:

hostname.domain.com IN NS hostname.domain.com
hostname.domain.com IN A 1.2.3.4

Where hostname.domain.com is itself not running a namserver.
It is correct in not processing this entry, as the glue is non-authoritative 
data, and cannot be confirmed by the nameserver ot the child zone.
However, ssh-keyscan should just skip this entry, not abort.

I noticed this when writing ftp://ftp.xelerance.com/sshfp/ which is a
python script that can use ssh-keyscan (or known_hosts files) to
generate SSHFP records.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/1

------------------------------------------------------------------------
On 2007-03-13T05:00:18+00:00 Senthilkumar-sen wrote:

Is there any chance that this bug will get fixed for the next release?

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/2

------------------------------------------------------------------------
On 2010-11-23T01:00:50+00:00 Aab wrote:

Created attachment 1961
One attempt at getting the rsa key from a remote server that was having a 
number of problems.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/4

------------------------------------------------------------------------
On 2010-11-23T01:04:06+00:00 Aab wrote:

I believe I've encountered the same or similar ssh-keyscan problem.
local ssh  - OpenSSH_5.1p1 Debian-5, OpenSSL 0.9.8g 19 Oct 2007
remote ssh - OpenSSH_4.3p2, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
The remote server was having "problems": 1) no connection; 2) connection and 
key returned; or 3) connection but hanging until remote time out and
disconnect.  With the latter, ssh-keyscan aborted immediately with 
exit-code=255 (see attachment).

I disagree with the original poster in that I think that ssh-keyscan
should continue in all cases except for an internal error.  In our case,
ssh-keyscan is buried several layers deep in wrapper scripts where it is
being fed (today) 3690+ host names.  Per the man pages, I was expecting
it to continue regardless of what the remote servers did or didn't do.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/5

------------------------------------------------------------------------
On 2010-12-03T00:19:46+00:00 Aab wrote:

Created attachment 1969
Fix(?) for premature ssh-keyscan abort.

This adds a local/static `cleanup_exit()' function to ssh-keyscan so
that aborts in non-ssh-keyscan code can be converted to "continue"s
while the `dispatch_run()' function is being executed.  It mimics the
already extant local/static `fatal()' function in using `exit()' instead
of the `_exit()' used in the default cleanup.c.

Two observations:
1) I also incremented the `howmany()' argument #1 count by 1.  This is probably 
unnecessary but I note that all other occasions where `howmany()' is used do 
this (and I'm chicken ...).
2) The current local/static `fatal()' function could possibly be removed and 
the default one, defined in fatal.c, be used.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/6

------------------------------------------------------------------------
On 2011-02-17T15:36:54+00:00 Count-mindrot wrote:

I'm running into the same problem on recent versions.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/7

------------------------------------------------------------------------
On 2011-02-17T18:31:44+00:00 Count-mindrot wrote:

btw: I've elevated this to 'major', as it completely breaks the
usefulness for ssh-keyscan in large networks, as the error condition
(len == 0 in packet_read_seqnr() in packet.c; resulting in
logit("Connection closed ... etc") and cleanup_exit(255);) is much
easier to hit. On 10 runs of ssh-keyscan over ~3800 IPs I couldn't get a
single complete run without hitting this. Please fix.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/8

------------------------------------------------------------------------
On 2011-02-17T23:58:23+00:00 Aab wrote:

Mr. Kotes, I have a patch against openssh-5.[678]p1 for our problem that
could be called a workaround or a fix depending on your way of looking
at it.  The probable reason that `packet_read_seqnr()' gets the len==0
is that one of the IPs from which your attempting to get a key has a bad
`sshd' server that times out because of the "LoginGraceTime".  This, in
turn, causes almost all of the other servers that have open sockets at
that time to "LoginGraceTime" out as well.  To back up a bit,
`packet_read_seqnr()' calls the vanilla `cleanup_exit()' that in the
current ssh-keyscan aborts immediately rather than continuing like ssh-
keyscan's `fatal()' call does.  This is part 1 of the fix.  The second
part is to teach ssh-keyscan how to deal with the problem when a bad
server times out.  My patch does both although the code seems a bit
kludgy to me.

Unfortunately, we haven't had a bad server recently so I can't
completely test the patch (I'm using it in test mode now) and, until
then, I don't want to send it to the OpenSSH folks.  FWIW - our host
farm is 3500+ with an additional 1200+ to be online soon and probably
more in the late summer.

In my opioion, this should be marked as a bug against the current
openssh variant.  How do I go about doing that?

If you'd like to have a copy of the current patch so you can test it,
please tell me where to send it.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/9

------------------------------------------------------------------------
On 2011-02-18T00:04:12+00:00 Aab wrote:

I've noted that this is a ssh-keyscan bug and I've attached it to the
openssh-5.8p1 release.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/10

------------------------------------------------------------------------
On 2011-02-18T00:07:01+00:00 Aab wrote:

Oops, can't read.  ssh-keygen ain't ssh-keyscan.  Changed the component
back to Miscellaneous.  Hey, isn't ssh-keyscan a component also?

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/11

------------------------------------------------------------------------
On 2011-02-23T15:55:54+00:00 Daniel Richard G. wrote:

I reported this a while ago on the Ubuntu Launchpad bug tracker:

    https://bugs.launchpad.net/openssh/+bug/483928

I've also confirmed that the bug persists in OpenSSH 5.8p1, and I gave
your patch a try to scan a corporate network of 6000+ hosts.

Most of the hosts don't appear to be running SSH, but I can't be sure if
that's really the case, or if ssh-keyscan(1) is bugging out on many of
the connections. It does run through to the end of the list, but with
some anomalies, like "Connection closed by A.B.C.D" or "Received
disconnect from A.B.C.D: 2: Client Disconnect" messages that crop up
multiple times for the same IP address.

Is it possible that one bad connection can still take down active good
connections, even with this patch?

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/13

------------------------------------------------------------------------
On 2011-02-23T17:32:02+00:00 Aab wrote:

Ummm.  If you're referring to the "original" patch that I submitted,
It's out-of-date.  It was written before I had a complete(?) handle on
what was going wrong.  Included with this comment is an attachment with
the newer patch against the openssh-5.8p1 source.

A bit of explanation.  Some of the mods are for clarity.  When your
working, as we are, with a large number of hosts, "socket" doesn't tell
you very much as to where the problem is occuring.  Same with "Bad
hostkey alg".

In the patch, I've attempted to allow `ssh-keyscan' to continue if the
encountered problem is external in origin.  Some of the items that you
noticed are (I think) addressed by this patch.

NOTE - NOTE - NOTE - this patch has NOT been completely verified.  The
closed by remote because of LoginGraceTime" outs needs a bad remote
server so that that can be done.  Unfortunately, all of our servers are
playing nice-nice at present.  I did have an earlier buggy variant of
the patch that "tried" to execute the patch code but I screwed up and
generated an infinite loop instead.  The basic code is running as the
`ssh-keyscan' of choice in our setup.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/14

------------------------------------------------------------------------
On 2011-02-23T17:40:19+00:00 Aab wrote:

Created attachment 2000
openssh-5.8p1 - patch for ssh-keyscan

Is this comment different from the other one????

Later (better?) patch to fix `ssh-keyscan's premature aborting observed
in large network scans.  Hopefully, there are sufficient comments in the
code to describe the fix.  Please ask if you find something annoying.  I
also have patches for 5.6p1 and 5.7p1.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/15

------------------------------------------------------------------------
On 2011-02-24T15:50:54+00:00 Daniel Richard G. wrote:

With this updated patch, I'm seeing at least twice as many host keys
returned than before (up to ~2400, from ~1000), and the "multiple errors
from the same IP" oddness is gone now.

The more-specific error messages are very helpful. I do notice that
hosts which are firewalled or otherwise fail to yield a server banner
are not cited with an error message to stderr. I think this would be
useful if it can be done, that every host listed in the input is spoken
for one way or the other in the output, because that way you can be sure
that no host is being silently dropped by the program.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/16

------------------------------------------------------------------------
On 2011-02-27T02:24:50+00:00 Aab wrote:

Created attachment 2005
Upgraded(?) patch to include extra ssh-keyscan logging.

Try this to log all attempt failures.  I put it under control of a
command line option, '-L'.  One failure noted by ssh-keyscan is the
ECONNREFUSED that I think should have caused a standard error message to
be elided.  Except for the ECONNREFUSED, all of the new messages are
written by the `logit()' function.  FWIW - this patch may or may not
obsolete the patch supplied with attachment 2000 so I didn't check the
obsolete:2000 box.  I didn't test this patch out very thoroughly but
what testing I did showed what I wanted.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/17

------------------------------------------------------------------------
On 2011-02-27T09:09:55+00:00 Daniel Richard G. wrote:

aab, thanks for putting together this updated patch. I gave it a try,
and whether due to the patch or another issue that I hadn't encountered
before, it bombed out with this error:

[...]
# A.B.C.D SSH-2.0-dropbear_0.50
# W.X.Y.Z SSH-1.99-OpenSSH_3.9p1
# A.B.C.E SSH-2.0-dropbear_0.50
Connection closed by A.B.C.E
conalloc: attempt to reuse fdno 47
make: *** [ssh_known_hosts.unx.new] Error 255

A couple of ancillary notes on the patch:

1. The old and new filenames both have the .orig extension! I had to
edit one of each pair so that the patch could apply.

2. IMO, there isn't a need to add a new -L option... are "Connection
closed" and e.g. "no 'blah' hostkey alg(s)" really categorically
distinct to the end user?

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/18

------------------------------------------------------------------------
On 2011-02-27T18:16:05+00:00 Aab wrote:

># A.B.C.D SSH-2.0-dropbear_0.50
># W.X.Y.Z SSH-1.99-OpenSSH_3.9p1
># A.B.C.E SSH-2.0-dropbear_0.50
>Connection closed by A.B.C.E
>conalloc: attempt to reuse fdno 47
>make: *** [ssh_known_hosts.unx.new] Error 255

Oh boy, I missed something.  Is this repeatable?  I think I saw this
myself somewhere along the line but I thought I had fixed the problem.
Since my time is pretty much taken up for the next week or so, I don't
know when I'll be able to check.

>1. The old and new filenames both have the .orig extension! I had to
>edit one of each pair so that the patch could apply.

I just looked at the attachment.  There are two ".orig"s per file.  One
is on the `diff' statement and is ignored (I hope) by `patch'.  The
second is one line down on the "old" file identifier (---) and `patch'
does use that.  Which one was your `patch' making complaints about?

>2. IMO, there isn't a need to add a new -L option... are "Connection
>closed" and e.g. "no 'blah' hostkey alg(s)" really categorically
>distinct to the end user?

STDERR is extremely noisy as it is.  In my case, at this time, I think
I'd add on the order of 7000+ extra lines when I use '-L' that I'd need
to winnow to find any important data.  Besides, you can't forget that
god called "upward compatibility" you know (;-}).

And yes, if you meant "Connection timed out", I think that they are
distinct at least from a Systems Administrator (me) point of view.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/19

------------------------------------------------------------------------
On 2011-03-01T03:42:38+00:00 Daniel Richard G. wrote:

(In reply to comment #17)
> 
> Oh boy, I missed something.  Is this repeatable?  I think I saw this
> myself somewhere along the line but I thought I had fixed the problem. 
> Since my time is pretty much taken up for the next week or so, I don't
> know when I'll be able to check.

Well, I tried it again, and it ran to completion. Must be a rare failure
mode.

> I just looked at the attachment.  There are two ".orig"s per file.  One
> is on the `diff' statement and is ignored (I hope) by `patch'.  The
> second is one line down on the "old" file identifier (---) and `patch'
> does use that.  Which one was your `patch' making complaints about?

Presumably the second one. It was looking for e.g. kex.c.orig rather
than kex.c.

> STDERR is extremely noisy as it is.  In my case, at this time, I think
> I'd add on the order of 7000+ extra lines when I use '-L' that I'd need
> to winnow to find any important data.  Besides, you can't forget that
> god called "upward compatibility" you know (;-}).
> 
> And yes, if you meant "Connection timed out", I think that they are
> distinct at least from a Systems Administrator (me) point of view.

*shrugs* I'd pretty much expect a flood of information anyway. Given a
large network, you have to use grep(1) or the like to make any sense of
it.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/20

------------------------------------------------------------------------
On 2011-03-02T02:29:26+00:00 Aab wrote:

Created attachment 2008
patch - fixes bug in previous patch

>> Oh boy, I missed something.  Is this repeatable?  I think I saw this
>> myself somewhere along the line but I thought I had fixed the problem. 
>> Since my time is pretty much taken up for the next week or so, I don't
>> know when I'll be able to check.
>
>Well, I tried it again, and it ran to completion. Must be a rare
>failure mode.

Yep, I missed something.  The sockets associated with ALL connections
processed by the `keygrab_ssh2()' function are closed twice.  I missed
the close in the `packet.c:packet_close()' function that's called at the
bottom of the `keygrab_ssh2()' function.  I had assumed (bad bad word)
that the only close was in the `confree()' function.  Work/not work is
up to the gods and the relative connection timings I think.

>> I just looked at the attachment.  There are two ".orig"s per file.  One
>> is on the `diff' statement and is ignored (I hope) by `patch'.  The
>> second is one line down on the "old" file identifier (---) and `patch'
>> does use that.  Which one was your `patch' making complaints about?
>
>Presumably the second one. It was looking for e.g. kex.c.orig rather
>than kex.c.

The format of this patch is the same as before.  If you are using the
current GNU `patch', you should be able to `patch [-p0] < patch' in the
"openssh-5.8p1" parent directory.  If your in the "openssh-5.8p1"
directory itself, you should be able to `patch -p1 <patch'.

>> STDERR is extremely noisy as it is.  In my case, at this time, I think
>> I'd add on the order of 7000+ extra lines when I use '-L' that I'd need
>> to winnow to find any important data.  Besides, you can't forget that
>> god called "upward compatibility" you know (;-}).
>> 
>> And yes, if you meant "Connection timed out", I think that they are
>> distinct at least from a Systems Administrator (me) point of view.
>
>*shrugs* I'd pretty much expect a flood of information anyway. Given a
>large network, you have to use grep(1) or the like to make any sense of
>it.

I think that, if/when this patch is actually submitted to the OpenSSH
folks, I'll let the mavins there decide whether or not to have a '-L'
option.

To satisfy my curiosity, did you observe any missing hosts when you use
the '-L' option (and it actually completes)?

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/21

------------------------------------------------------------------------
On 2011-03-02T08:23:42+00:00 Daniel Richard G. wrote:

(In reply to comment #19)
> 
> Yep, I missed something.  The sockets associated with ALL connections
> processed by the `keygrab_ssh2()' function are closed twice.  I missed
> the close in the `packet.c:packet_close()' function that's called at
> the bottom of the `keygrab_ssh2()' function.  I had assumed (bad bad
> word) that the only close was in the `confree()' function.  Work/not
> work is up to the gods and the relative connection timings I think.

I tried the new patch, and no errors. I'll give it a few more runs to
see if anything breaks again.

> The format of this patch is the same as before.  If you are using the
> current GNU `patch', you should be able to `patch [-p0] < patch' in the
> "openssh-5.8p1" parent directory.  If your in the "openssh-5.8p1"
> directory itself, you should be able to `patch -p1 <patch'.

Oh, I know about -p0 vs. -p1 and such. The problem is that the patch, as
up currently, looks for foo.c.orig instead of foo.c. In other words,

    --- dir/foo.c.orig
    +++ dir/foo.c.orig  (WRONG)

    --- dir/foo.c.orig
    +++ dir/foo.c       (CORRECT)

> I think that, if/when this patch is actually submitted to the OpenSSH
> folks, I'll let the mavins there decide whether or not to have a '-L'
> option.

Fair enough, though I think there might be more value in just
(unconditionally) printing a tally at the end of how many valid hosts
were found, how many had no host algs, etc. (a bit like what "md5sum -c"
does when it encounters errors).

> To satisfy my curiosity, did you observe any missing hosts when you use
> the '-L' option (and it actually completes)?

Ah, I forgot to report on this; my bad!

I do see a few hosts in the input list that are not mentioned anywhere
in the stderr output. These appear to be strictly "alias" IP addresses,
e.g. for an input line of

    10.0.0.1,10.0.0.2,10.0.0.3 host.example.com,10.0.0.1,10.0.0.2,...
             ^^^^^^^^ ^^^^^^^^
                   these

This is the correct behavior, I take it?

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/22

------------------------------------------------------------------------
On 2011-03-02T18:34:29+00:00 Aab wrote:

(In reply to comment #20)
> (In reply to comment #19)
> 
>> The format of this patch is the same as before.  If you are using the
>> current GNU `patch', you should be able to `patch [-p0] < patch' in the
>> "openssh-5.8p1" parent directory.  If your in the "openssh-5.8p1"
>> directory itself, you should be able to `patch -p1 <patch'.
> 
>Oh, I know about -p0 vs. -p1 and such. The problem is that the patch,
>as up currently, looks for foo.c.orig instead of foo.c. In other words,
> 
>     --- dir/foo.c.orig
>     +++ dir/foo.c.orig  (WRONG)
> 
>     --- dir/foo.c.orig
>     +++ dir/foo.c       (CORRECT)

Hmmm, but the patch doesn't have two consecutive lines with ".orig" as
you describe above.  From observation, the first three lines for each
modified file are similar to

diff -u openssh-5.8p1/kex.c.orig openssh-5.8p1/kex.c
--- openssh-5.8p1/kex.c.orig    2010-09-24 08:11:14.000000000 -0400
+++ openssh-5.8p1/kex.c 2011-02-11 18:14:03.396688000 -0500

Are you using the GNU patch?  The attached patch text works for me with
no changes whatsoever.  Or to ask it somewhat differently, does your
`patch' process WRONG even though the text is actually CORRECT?  Is it
possible that your`patch' is not ignoring the "diff" line?

>> I think that, if/when this patch is actually submitted to the OpenSSH
>> folks, I'll let the mavins there decide whether or not to have a '-L'
>> option.
> 
> Fair enough, though I think there might be more value in just
> (unconditionally) printing a tally at the end of how many valid hosts
> were found, how many had no host algs, etc. (a bit like what "md5sum
> -c" does when it encounters errors).

Actually, after I had sent the previous, I thought I should have added
that the described approach is a cop out on my part (;-}).

>> To satisfy my curiosity, did you observe any missing hosts when you use
>> the '-L' option (and it actually completes)?
> 
> Ah, I forgot to report on this; my bad!
> 
> I do see a few hosts in the input list that are not mentioned anywhere
> in the stderr output. These appear to be strictly "alias" IP addresses,
> e.g. for an input line of
> 
>     10.0.0.1,10.0.0.2,10.0.0.3 host.example.com,10.0.0.1,10.0.0.2,...
>              ^^^^^^^^ ^^^^^^^^
>                    these
> 
> This is the correct behavior, I take it?

I submit hosts, one per line, as the data to ssh-keyscan and am not
familiar with the "alias" format.  In fact, your comments clarified it
somewhat for me.  If you meant that "10.0.0.1" was seen in stderr and
the others weren't, I believe that this is the "correct" behavior if
ssh-keyscan had success with "10.0.0.1".  I think the code tells me that
it stops looking after the first IP/host with which it has success.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/23

------------------------------------------------------------------------
On 2011-03-03T04:02:12+00:00 Daniel Richard G. wrote:

(In reply to comment #21)
> 
> Hmmm, but the patch doesn't have two consecutive lines with ".orig" as
> you describe above.  From observation, the first three lines for each
> modified file are similar to
> 
> diff -u openssh-5.8p1/kex.c.orig openssh-5.8p1/kex.c
> --- openssh-5.8p1/kex.c.orig    2010-09-24 08:11:14.000000000 -0400
> +++ openssh-5.8p1/kex.c    2011-02-11 18:14:03.396688000 -0500

Um. Are we looking at the same file? Here are the first three lines of
your most recent patch (attachment 2008, in comment #19):

--- openssh-5.8p1/kex.c.orig    2010-09-24 08:11:14.000000000 -0400
+++ openssh-5.8p1/kex.c.orig    2011-02-11 18:14:03.396688000 -0500
@@ -49,6 +49,7 @@ 

> Are you using the GNU patch?  The attached patch text works for me with
> no changes whatsoever.  Or to ask it somewhat differently, does your
> `patch' process WRONG even though the text is actually CORRECT?  Is it
> possible that your`patch' is not ignoring the "diff" line?

This is on an Ubuntu Linux system:

host:/tmp/openssh-5.8p1$ patch -p1 --dry-run <aab-2008.patch 
patching file kex.c.orig
Hunk #1 FAILED at 49.
Hunk #2 FAILED at 367.
2 out of 2 hunks FAILED -- saving rejects to file kex.c.orig.rej
patching file packet.c.orig
Hunk #1 FAILED at 1025.
Hunk #2 FAILED at 1035.
Hunk #3 FAILED at 1100.
3 out of 3 hunks FAILED -- saving rejects to file packet.c.orig.rej
[...]

If I edit each "+++" line in the patch, it applies cleanly.

> I submit hosts, one per line, as the data to ssh-keyscan and am not
> familiar with the "alias" format.  In fact, your comments clarified it
> somewhat for me.  If you meant that "10.0.0.1" was seen in stderr and
> the others weren't, I believe that this is the "correct" behavior if
> ssh-keyscan had success with "10.0.0.1".  I think the code tells me
> that it stops looking after the first IP/host with which it has
> success.

Okay, that seems reasonable. (Yes, I only saw 10.0.0.1 and not the other
two.)

The sample "Input format" line in the ssh-keyscan man page has two IP
addresses in the first column, though the semantics of this are left
unexplained. My assumption is that it's meant for hosts with round-
robined DNS names, where the SSH server at each address uses the same
host keys. (Which would be consistent with what you describe.)

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/24

------------------------------------------------------------------------
On 2011-03-03T05:03:01+00:00 Aab wrote:

(In reply to comment #22)
> (In reply to comment #21)
>> 
>> Hmmm, but the patch doesn't have two consecutive lines with ".orig" as
>> you describe above.  From observation, the first three lines for each
>> modified file are similar to
>> 
>> diff -u openssh-5.8p1/kex.c.orig openssh-5.8p1/kex.c
>> --- openssh-5.8p1/kex.c.orig    2010-09-24 08:11:14.000000000 -0400
>> +++ openssh-5.8p1/kex.c    2011-02-11 18:14:03.396688000 -0500
> 
> Um. Are we looking at the same file? Here are the first three lines of
> your most recent patch (attachment 2008 [details], in comment #19):
> 
> --- openssh-5.8p1/kex.c.orig    2010-09-24 08:11:14.000000000 -0400
> +++ openssh-5.8p1/kex.c.orig    2011-02-11 18:14:03.396688000 -0500
> @@ -49,6 +49,7 @@ 

Boy, I'm not sure that we are looking at the same file.  I just did a

  wget -Ojunk https://bugzilla.mindrot.org/attachment.cgi?id=2008

and got my version.  When I click on the attachment line near the top of
the bug #1213 comments (this page - "patch - fixes bug ..."), I get my
version.  Clicking on the "details" button that you specified above, I
get my version.

Have we encountered a bug in yet another utility?  Browser problem?

I should have thanked you earlier for "testing" the patch so I'll do so
now - THANKS.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/25

------------------------------------------------------------------------
On 2011-03-03T06:51:48+00:00 Daniel Richard G. wrote:

Okay, I think I see what's going on here.

When you click on the "attachment 2008" link, you're taken to a fancy
side-by-side rendition of the diff. At the top, there are a series of
links:

    View | Details | Raw Unified | Return to bug 1213 | Differences ...

I was clicking on "Raw Unified," and got the broken patch. "View" goes
to the URL you gave (which yields the correct patch). Confusing, isn't
it?

Anyway, I'm happy to test your patches, because that means I can get the
company-wide ssh_known_hosts file I've been needing so much :-)

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/26

------------------------------------------------------------------------
On 2011-03-04T01:07:11+00:00 Aab wrote:

(In reply to comment #24)
> Okay, I think I see what's going on here.
> 
> When you click on the "attachment 2008 [details]" link, you're taken to a 
> fancy
> side-by-side rendition of the diff. At the top, there are a series of
> links:
> 
>     View | Details | Raw Unified | Return to bug 1213 | Differences ...
> 
> I was clicking on "Raw Unified," and got the broken patch. "View" goes
> to the URL you gave (which yields the correct patch). Confusing, isn't
> it?

Yes, it is indeed confusing.  I've never used the exact path you used to
get to the patch so I missed seeing the "bad" representation of it.

One of the things that I've observed in generating the "ssh_known_hosts"
file is that it can end up having a quite variable keyset as it depends
on ALL of the hosts ALWAYS being up (don't we wish).  It's probably
overkill but we generate the "hosts" file once an hour via a set of
wrapper scripts.  Included within the scripts is a database that
contains the current keys for all hosts that are currently supposed to
be active (previously acquired via these same scripts).  This allows us
two capabilities: 1) if there is no key returned for some host, the
database can supply the last one and 2) it allows us to see if there
have been any changes in the keys that might signify a security break.

A second part is a condensation of the keys via globbing. This assumes
that a number of the hosts have the same key.  The cluster nodes on our
private networks are basically all cloned so we do get considerable
condensation.  Right now, for 4700+ hosts, the "hosts" file has 334
entries.

The core script is a highly modified variant of the GNU licensed script,
"make_ssh_known_hosts.pl", that was in "ssh-1.0.0" (circa 1998).  Note
that's "ssh" not "openssh".  My original came from "ssh-1.2.26".  For
some reason, it disappeared when the OpenSSH folks took over.   For
Linux boxes, it's still dependent on my bind 9 hack of `nslookup' as I
haven't had time to modify it to use the current GNU `host'.

Would you be interested in anything like this?

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/27

------------------------------------------------------------------------
On 2011-03-05T08:08:28+00:00 Daniel Richard G. wrote:

(In reply to comment #25)
> 
> Yes, it is indeed confusing.  I've never used the exact path you used
> to get to the patch so I missed seeing the "bad" representation of it.

Lord knows what the point of that link even is... I clicked on it only
because "Raw" suggested that it would yield the "real" text/plain diff
instead of a fancy HTML rendition.

> Would you be interested in anything like this?

I appreciate the offer, but a database would be overkill for my use
case. I'm not in my company's IT department, and metamorphosing host
keys on those 6000+ hosts are waaaay out of my purview. (I can't get too
worked up over the security implications, either, since much worse than
that is officially tolerated.)

If anything, the most I would do is put together a Perl script to merge
an old and new known_hosts file, such that new entries override old
ones, and old ones that don't have a newer replacement are kept.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/28

------------------------------------------------------------------------
On 2011-03-05T19:04:20+00:00 Paul Wouters wrote:

(In reply to comment #26)
> (In reply to comment #25)

> If anything, the most I would do is put together a Perl script to merge
> an old and new known_hosts file, such that new entries override old
> ones, and old ones that don't have a newer replacement are kept.

You really want to look at SSHFP DNS records protected by DNSSEC, and
setting VerifyHostKeyDNS ask in your /etc/ssh/ssh_config

you can use the "sshfp" tool for that, which is exactly why I was
interested in this bug. sshfp can AXFR a zone, and use ssh-keyscan to
connect to all A records in the zone and print the SSHFP record to add
in your zones.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/29

------------------------------------------------------------------------
On 2011-03-05T19:13:53+00:00 Daniel Richard G. wrote:

(In reply to comment #27)
> 
> You really want to look at SSHFP DNS records protected by DNSSEC, and
> setting VerifyHostKeyDNS ask in your /etc/ssh/ssh_config

I would, if I were in my company's IT department :-)

(All I'm doing is generating an ssh_known_hosts file that is accessible
to a handful of clients via a local fileserver. The network
infrastructure beyond that is completely out of my hands.)

> you can use the "sshfp" tool for that, which is exactly why I was
> interested in this bug. sshfp can AXFR a zone, and use ssh-keyscan to
> connect to all A records in the zone and print the SSHFP record to add
> in your zones.

Hmm, that could be useful. While I couldn't do much with the SSHFP
records, the AXFR->keyscan functionality would be useful. (Right now,
I'm doing the AXFR via host(1), and using a Perl script to reformat that
into a hosts list for ssh-keyscan(1).)

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/30

------------------------------------------------------------------------
On 2011-03-08T07:03:53+00:00 Aab wrote:

Comment on attachment 1961
One attempt at getting the rsa key from a remote server that was having a 
number of problems.

This has been resolved with attachment 2008.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/31

------------------------------------------------------------------------
On 2011-03-15T01:14:37+00:00 Aab wrote:

One of our `sshd' servers finally gave me sufficient problems to test
the last of the patched code and, as far as I can tell, it worked.

Is there anybody out there that has any issues with the current patch?
If not, I wonder if I can catch the attention of any of the OpenSSH
folks.  I note that this problem has yet to be assigned to anyone.

Or is there another route that I should take for attention?

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/32

------------------------------------------------------------------------
On 2011-03-18T06:36:09+00:00 Aab wrote:

Created attachment 2016
Remove a bit of confusion from previous patch.

I guess I'm the one that has an issue with the previous patch.  The
hostkey alg error message always references the "other end" of the
socket.  On the server the message reads as if the client was the one
that didn't have the necessary hostkey algorithms.  The updated patch
has modified verbage for the server that attempts to distnguish the
difference.

I have a general issue with this anyhow.  Wouldn't it be possible to
check the server algorithms BEFORE asking the server to return a key
that it doesn't have.  If I read the code correctly, the
debug2:kex_parse_init messages indicate that the code extracts the list
of algorithms that the server supports from the SSH2_MSG_KEXINIT
response.  Isn't that before the request?  Right now both the server and
the client issue the same abort message and that seems a waste of time
(and log file space (;-})).

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/33

------------------------------------------------------------------------
On 2011-03-19T05:38:46+00:00 Aab wrote:

Created attachment 2018
Add 'L' option to usage message

Another small issue.  I forgot to add the new '-L' option to the usage
message.  Also modified some of the comments for clarity.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/34

------------------------------------------------------------------------
On 2011-03-26T00:38:20+00:00 Aab wrote:

Created attachment 2021
Withdraw patch attachment #2018.

This missive just obsoletes(withdraws) the current variant of the patch.
We just had a bad network glitch here and, because of it, ssh-keyscan
called the `select()' function in the `packet_read_seqnr()' function
with a NULL timeout value.  Since the read wasn't going to receive any
data because of the glitch ever, it occasionally did one of those hang
forever thingys.  The patch still works if your network doesn't glitch
like ours did albeit very crudely.

It turns out that the original coders of ssh-keyscan missed(?) a call to
the `packet_set_timeout()' function which in turn caused the above
referenced NULL.  I'm in the process of rewriting the patch to include a
"set" call.

FWIW - bugzilla won't let me subit this withour a non-null file.  The
new attachment is a NL.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/35

------------------------------------------------------------------------
On 2011-06-14T04:43:42+00:00 Aab wrote:

Created attachment 2057
Fix  for previous patch variant.

For all those waiting breathlessly (ha) for a correction to the ssh-
keyscan patch I submitted earlier, here it is.  I apologize for not
getting it here sooner.

This variant adds a call to the `packet_set_timeout()' function using
the time value set or defaulted to on the command line by the '-T'
option.  The man page actually implies that this is the case but the
code to implement it was never included.  Part of the new code is a trap
for the timeout condition and a resetting of the remaining active
socket's timeout values to compensate for the time used waiting for the
slow/braindead server that caused the timeout.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/36

------------------------------------------------------------------------
On 2011-06-14T04:52:46+00:00 Aab wrote:

Forgot to change the release to 5.8p2.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/37

------------------------------------------------------------------------
On 2011-06-22T01:35:02+00:00 Aab wrote:

Change component from "miscellaneous" to the new "ssh-keyscan".

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/38

------------------------------------------------------------------------
On 2011-11-25T01:40:45+00:00 Daniel Richard G. wrote:

Yet another failure mode...

[...]
# XXX.YYY.ZZ.8 SSH-2.0-Sun_SSH_1.1.3
# XXX.YYY.ZZ.9 SSH-2.0-OpenSSH_3.8.1p1
# XXX.YYY.ZZ.14 SSH-2.0-OpenSSH_4.3
# 10.10.1.35 SSH-2.0-RomSShell_4.62
Received disconnect from 10.10.1.35: 2: Protocol Timeout
make: *** [ssh_known_hosts.unx.new] Error 255

This is with 5.8p1 still. aab@, I'll have to give your latest patch a
try.

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/39

------------------------------------------------------------------------
On 2011-11-25T07:19:27+00:00 Aab wrote:

I haven't seen this one before.  The text you included indicates that
ssh-keyscan was processing a Protocol 2 key and it should be using the
modified code to do it.  Is there any way that you could send me a
traceback when the failure occurs?

FWIW - I think the " 2: Protocol Timeout" part of the message comes from
the remote "SSH-2.0-RomSShell_4.62" server because I couldn't find that
text in the OpenSSH source.  What is "RomSShell"?

Reply at: https://bugs.launchpad.net/openssh/+bug/483928/comments/40

** Changed in: openssh
       Status: Unknown => Confirmed

** Changed in: openssh
   Importance: Unknown => High

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/483928

Title:
  ssh-keyscan(1) exits prematurely on some non-fatal errors

To manage notifications about this bug go to:
https://bugs.launchpad.net/openssh/+bug/483928/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 483928] Re: ssh-keyscan(1) exits prematurely on some non-fatal errors

Reply via email to