I am getting some really hard to understand failures.
I have three managed machines, running Ubuntu 14.04.03 --- which normally
has Linux kernel 3.13. I run Ansible 1.9.4 on my Mac (OS 10.10.5) to
manage them. The managed machines are VMs in a cloud service, on a
datacenter subnet, and I use an OpenVPN tunnel between them and my Mac. I
have a playbook, and it runs. All is well. BTW: I have no ansible.cfg
anywhere (I set the SL_INVENTORY envar to point at the appropriate dynamic
inventory program), and none of my SSH config files say anything about
TCPKeepAlive or ServerAlive<whatever>.
Then on the managed machines I run (as root) `apt-get install
linux-generic-lts-wily`, which upgrades the kernel to 4.2. After that my
Ansible is flaky. When I run my playbook, usually one or two of the three
managed machines fails the gather-facts; the remainder fail in the first
few tasks. Which fails where is variable, so I think this is due to some
intermittent problem.
Thinking it might be a connectivity flakiness, I created `ansible.cfg` that
specifies a timeout of 100, and set `TCPKeepAlive no` and
`ServerAliveCountMax 100` in my SSH client config (leaving
ServerAliveInterval unspecified everywhere, which I think disables that
function). These did not solve the problem. The playbook runs to full
failure in under 30 seconds.
The failure message shows a truncated reply from the remote machine and
then some SSH debugging that ends with a complaint about a broken pipe.
Here is an example from the gather-facts step (I passed `-vvvv` to
`ansible-playbook`):
GATHERING FACTS
***************************************************************
<10.140.155.166> ESTABLISH CONNECTION FOR USER: mspreitzer
<10.140.155.162> ESTABLISH CONNECTION FOR USER: mspreitzer
<10.140.155.166> REMOTE_MODULE setup
<10.140.155.161> ESTABLISH CONNECTION FOR USER: mspreitzer
<10.140.155.162> REMOTE_MODULE setup
<10.140.155.161> REMOTE_MODULE setup
<10.140.155.166> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o
ControlPersist=60s -o
ControlPath="/Users/mspreitz/.ansible/cp/ansible-ssh-%h-%p-%r" -o
KbdInteractiveAuthentication=no -o
PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey
-o PasswordAuthentication=no -o User=mspreitzer -o ConnectTimeout=100
10.140.155.166 /bin/sh -c 'mkdir -p
$HOME/.ansible/tmp/ansible-tmp-1454784058.99-138766034918573 && chmod a+rx
$HOME/.ansible/tmp/ansible-tmp-1454784058.99-138766034918573 && echo
$HOME/.ansible/tmp/ansible-tmp-1454784058.99-138766034918573'
<10.140.155.162> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o
ControlPersist=60s -o
ControlPath="/Users/mspreitz/.ansible/cp/ansible-ssh-%h-%p-%r" -o
KbdInteractiveAuthentication=no -o
PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey
-o PasswordAuthentication=no -o User=mspreitzer -o ConnectTimeout=100
10.140.155.162 /bin/sh -c 'mkdir -p
$HOME/.ansible/tmp/ansible-tmp-1454784058.99-66302778668274 && chmod a+rx
$HOME/.ansible/tmp/ansible-tmp-1454784058.99-66302778668274 && echo
$HOME/.ansible/tmp/ansible-tmp-1454784058.99-66302778668274'
<10.140.155.161> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o
ControlPersist=60s -o
ControlPath="/Users/mspreitz/.ansible/cp/ansible-ssh-%h-%p-%r" -o
KbdInteractiveAuthentication=no -o
PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey
-o PasswordAuthentication=no -o User=mspreitzer -o ConnectTimeout=100
10.140.155.161 /bin/sh -c 'mkdir -p
$HOME/.ansible/tmp/ansible-tmp-1454784058.99-221770072514077 && chmod a+rx
$HOME/.ansible/tmp/ansible-tmp-1454784058.99-221770072514077 && echo
$HOME/.ansible/tmp/ansible-tmp-1454784058.99-221770072514077'
<10.140.155.166> PUT
/var/folders/cd/d050bcgj7xd6lkbfhnb9bns80000gp/T/tmpFwQwuI TO setup
<10.140.155.161> PUT
/var/folders/cd/d050bcgj7xd6lkbfhnb9bns80000gp/T/tmpcY2it_ TO setup
<10.140.155.162> PUT
/var/folders/cd/d050bcgj7xd6lkbfhnb9bns80000gp/T/tmpuXklyy TO setup
<10.140.155.166> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o
ControlPersist=60s -o
ControlPath="/Users/mspreitz/.ansible/cp/ansible-ssh-%h-%p-%r" -o
KbdInteractiveAuthentication=no -o
PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey
-o PasswordAuthentication=no -o User=mspreitzer -o ConnectTimeout=100
10.140.155.166 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible,
key=spxlnvdjayufeyagwywqhtdtptisrnej] password: " -u root /bin/sh -c
'"'"'echo BECOME-SUCCESS-spxlnvdjayufeyagwywqhtdtptisrnej; LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8 /usr/bin/python setup'"'"''
<10.140.155.161> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o
ControlPersist=60s -o
ControlPath="/Users/mspreitz/.ansible/cp/ansible-ssh-%h-%p-%r" -o
KbdInteractiveAuthentication=no -o
PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey
-o PasswordAuthentication=no -o User=mspreitzer -o ConnectTimeout=100
10.140.155.161 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible,
key=dprrxybimhltrqziynaikzahzsgdjmlc] password: " -u root /bin/sh -c
'"'"'echo BECOME-SUCCESS-dprrxybimhltrqziynaikzahzsgdjmlc; LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8 /usr/bin/python setup'"'"''
<10.140.155.162> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o
ControlPersist=60s -o
ControlPath="/Users/mspreitz/.ansible/cp/ansible-ssh-%h-%p-%r" -o
KbdInteractiveAuthentication=no -o
PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey
-o PasswordAuthentication=no -o User=mspreitzer -o ConnectTimeout=100
10.140.155.162 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible,
key=bnojgeocfbeoxbgakzuluppvtxaviuyr] password: " -u root /bin/sh -c
'"'"'echo BECOME-SUCCESS-bnojgeocfbeoxbgakzuluppvtxaviuyr; LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8 /usr/bin/python setup'"'"''
failed: [10.140.155.166] => {"failed": true, "parsed": false}
BECOME-SUCCESS-spxlnvdjayufeyagwywqhtdtptisrnej
{"verbose_override": true, "changed": false, "ansible_facts": {*<lots of
gather-facts output here, but it is not complete/>*OpenSSH_6.2p2, OSSLShim
0.9.8r 8 Dec 2011
debug1: Reading configuration data /Users/mspreitz/.ssh/config
debug1: /Users/mspreitz/.ssh/config line 8: Applying options for 10.*
debug1: Reading configuration data /etc/ssh_config
debug1: /etc/ssh_config line 20: Applying options for *
debug1: /etc/ssh_config line 102: Applying options for *
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 9654
debug3: mux_client_request_session: session request sent
debug1: mux_client_request_session: master session id: 2
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 0
Shared connection to 10.140.155.166 closed.
>From the same run, here is a failure at a task for the one machine that
succeeded to gather-facts in this case:
TASK: [bootstrap | add any dependent repository keys from url]
****************
<10.140.155.162> ESTABLISH CONNECTION FOR USER: mspreitzer
<10.140.155.162> REMOTE_MODULE apt_key
url=http://10.140.132.215:8082/Release.pgp
<10.140.155.162> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o
ControlPersist=60s -o
ControlPath="/Users/mspreitz/.ansible/cp/ansible-ssh-%h-%p-%r" -o
KbdInteractiveAuthentication=no -o
PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey
-o PasswordAuthentication=no -o User=mspreitzer -o ConnectTimeout=100
10.140.155.162 /bin/sh -c 'mkdir -p
$HOME/.ansible/tmp/ansible-tmp-1454784077.96-58394575390812 && chmod a+rx
$HOME/.ansible/tmp/ansible-tmp-1454784077.96-58394575390812 && echo
$HOME/.ansible/tmp/ansible-tmp-1454784077.96-58394575390812'
<10.140.155.162> PUT
/var/folders/cd/d050bcgj7xd6lkbfhnb9bns80000gp/T/tmpxHKer5 TO apt_key
<10.140.155.162> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o
ControlPersist=60s -o
ControlPath="/Users/mspreitz/.ansible/cp/ansible-ssh-%h-%p-%r" -o
KbdInteractiveAuthentication=no -o
PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey
-o PasswordAuthentication=no -o User=mspreitzer -o ConnectTimeout=100
10.140.155.162 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible,
key=jsmmpfgpouhdlwmcvgmwvbhidafftccn] password: " -u root /bin/sh -c
'"'"'echo BECOME-SUCCESS-jsmmpfgpouhdlwmcvgmwvbhidafftccn; LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8 /usr/bin/python apt_key'"'"''
failed: [10.140.155.162] => (item={'repo': u'deb
http://10.140.132.215:8082/ docker main', 'key_url':
u'http://10.140.132.215:8082/Release.pgp'}) => {"failed": true, "item":
{"key_url": "http://10.140.132.215:8082/Release.pgp", "repo": "deb
http://10.140.132.215:8082/ docker main"}, "parsed": false}
BECOME-SUCCESS-jsmmpfgpouhdlwmcvgmwvbhidafftccn
OpenSSH_6.2p2, OSSLShim 0.9.8r 8 Dec 2011
debug1: Reading configuration data /Users/mspreitz/.ssh/config
debug1: /Users/mspreitz/.ssh/config line 8: Applying options for 10.*
debug1: Reading configuration data /etc/ssh_config
debug1: /etc/ssh_config line 20: Applying options for *
debug1: /etc/ssh_config line 102: Applying options for *
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 9658
debug3: mux_client_request_session: session request sent
debug1: mux_client_request_session: master session id: 2
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 0
Shared connection to 10.140.155.162 closed.
In desperation I captured all packets on the VPN network interface on my
Mac during that run, then looked at it with Wireshark. I see one
long-lived connection for each of the three managed machines, plus one
short-lived connection for 10.140.155.162.
I see no end to the long-lived connections. Just to make sure it was not
due to a missing buffer flush in the packet capture, I did a `ping` of one
managed machine after `ansible-playbook` finished; I see the `ping` in the
captured packets. So how can Ansible suffer from truncated replies when
the connections stay open? How can the connections stay open after
`ansible-playbook` exits?
The short-lived connection is even stranger. It comes late in the trace.
The client closes it soon after the completion of the three-packet
handshake that opens the connection. The client sends no TCP payload at
all, not even the "client hello" of TLS. The server sends a "server
hello", which prompts the client to respond with a RST. What the heck is
going on with that?
Thanks for any clues,
Mike
--
You received this message because you are subscribed to the Google Groups
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/2420a36a-a5ec-4dbc-b65a-5185793c4c0a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.