= hirsute verification =
# Start by showing we can still reproduce the problem w/o the -proposed 
packages:
ubuntu@avoton02:~$ sudo iptables -A INPUT -p tcp -s 91.189.88.136 -m string 
--string maas.io --algo bm -j DROP
ubuntu@avoton02:~$ python3 ./repro.py & sleep 60
[1] 3386
# 60 seconds have passed, still hung:
ubuntu@avoton02:~$ sudo strace -p 3386
strace: Process 3386 attached
read(3, ^Cstrace: Process 3386 detached
 <detached ...>

ubuntu@avoton02:~$ fg
python3 ./repro.py
^CTraceback (most recent call last):
  File "/home/ubuntu/./repro.py", line 6, in <module>
    r = RequestsUrlReader(url)
  File "/usr/lib/python3/dist-packages/simplestreams/contentsource.py", line 
381, in __init__
    self.req = requests.get(url, stream=True, auth=auth, headers=headers)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in 
request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in 
urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 382, in 
_make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1012, 
in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 411, in 
connect
    self.sock = ssl_wrap_socket(
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 428, in 
ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 472, in 
_ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.9/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.9/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/lib/python3.9/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
KeyboardInterrupt

# Now upgrade and demonstrate the problem is fixed

ubuntu@avoton02:~$ sudo apt install python3-simplestreams simplestreams -y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be upgraded:
  python3-simplestreams simplestreams
2 upgraded, 0 newly installed, 0 to remove and 68 not upgraded.
Need to get 31.8 kB/37.8 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu hirsute-proposed/main amd64 
python3-simplestreams all 0.1.0-30-g3cc8988a-0ubuntu1.21.04.1 [31.8 kB]
Fetched 31.8 kB in 0s (119 kB/s)                
(Reading database ... 79414 files and directories currently installed.)
Preparing to unpack 
.../python3-simplestreams_0.1.0-30-g3cc8988a-0ubuntu1.21.04.1_all.deb ...
Unpacking python3-simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) over 
(0.1.0-30-g3cc8988a-0ubuntu1) ...
Preparing to unpack 
.../simplestreams_0.1.0-30-g3cc8988a-0ubuntu1.21.04.1_all.deb ...
Unpacking simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) over 
(0.1.0-30-g3cc8988a-0ubuntu1) ...
Setting up python3-simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) ...
Setting up simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) ...
Scanning processes...                                                           
                                
Scanning processor microcode...                                                 
                                
Scanning linux images...                                                        
                                

Running kernel seems to be up-to-date.

The processor microcode seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.
ubuntu@avoton02:~$ python3 ./repro.py & sleep 60
[1] 3605
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 382, in 
_make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1012, 
in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 411, in 
connect
    self.sock = ssl_wrap_socket(
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 428, in 
ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 472, in 
_ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.9/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.9/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/lib/python3.9/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
socket.timeout: _ssl.c:1106: The handshake operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 755, in 
urlopen
    retries = retries.increment(
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 531, in 
increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in 
urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 385, in 
_make_request
    self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 336, in 
_raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='images.maas.io', 
port=443): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/./repro.py", line 6, in <module>
    r = RequestsUrlReader(url)
  File "/usr/lib/python3/dist-packages/simplestreams/contentsource.py", line 
382, in __init__
    self.req = requests.get(
  File "/usr/lib/python3/dist-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in 
request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='images.maas.io', 
port=443): Read timed out. (read timeout=10)

[1]+  Exit 1                  python3 ./repro.py


** Description changed:

  [Impact]
  
  The bug is about simplestreams possibly getting stuck waiting forever
  for an an HTTP response that never comes, e.g. because of networking
  issues. This can potentially affect any package depending on
  simplestreams, but specifically it was reported affecting MAAS, where it
  causes server deployments to timeout.
  
  [Test Plan]
+ Install an iptables rule to block SSL handshaking w/ the MAAS simplestreams 
repo:
  
- Ideally this should be tested by building a MAAS snap with the
- simplestreams package including the fix, verifying that is works as
- expected.
+ -------------------------
+ $ sudo iptables -A INPUT -p tcp -s 91.189.88.136 -m string --string maas.io 
--algo bm -j DROP
+ -------------------------
+ 
+ Run the reproducer described below, and verify that it hangs
+ indefinitely (I recommend waiting 60s):
+ 
+ -------------------------
+ $ cat repro.py
+ #!/usr/bin/env python3
+ 
+ from simplestreams.contentsource import RequestsUrlReader
+ 
+ url = "https://images.maas.io/ephemeral-v3/stable/streams/v1/index.sjson";
+ r = RequestsUrlReader(url)
+ -------------------------
+ 
+ With the fix applied, verify that it does timeout in ~10s.
  
  [Regression Potential]
  
- Very little. Scenarios where it takes more than 10s for a remote server
- to provide simplestreams with the data it requested are unlikely, but
- can't be fully excluded.
+ Scenarios where it takes more than 10s to initiate a connection are
+ unlikely, but possible. Code that does not properly handle a timeout
+ exception in these situations may begin to fail.
  
  [Original Description]
  
  = How to determine you are seeing this problem =
  Does your MAAS server seem to get "hung up", where deployments suddenly start 
failing w/ lots of connection timeouts to the MAAS server?
  
  Get a list of pids of your regiond processes:
  $ ps -ef | grep regiond
  
  Run strace on each one to see if one is stuck in a connect() or recv() call:
  $ sudo strace -p $pid
  recv(...
  
  (normally you should see a lot of epoll_ctl() calls go by if not hung)
  
  If one is hung, use lsof to see what it is connected to:
  sudo lsof -i -a -p $pid
  
  If you see an open connection to your images server, then this maybe
  your problem. sudo kill -9 of the hung pid will cause it to respawn and
  recover.

** Tags removed: verification-needed-hirsute
** Tags added: verification-done-hirsute

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1908452

Title:
  MAAS stops working and deployment fails after `Loading ephemeral` step

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1908452/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to