= hirsute verification = # Start by showing we can still reproduce the problem w/o the -proposed packages: ubuntu@avoton02:~$ sudo iptables -A INPUT -p tcp -s 91.189.88.136 -m string --string maas.io --algo bm -j DROP ubuntu@avoton02:~$ python3 ./repro.py & sleep 60 [1] 3386 # 60 seconds have passed, still hung: ubuntu@avoton02:~$ sudo strace -p 3386 strace: Process 3386 attached read(3, ^Cstrace: Process 3386 detached <detached ...>
ubuntu@avoton02:~$ fg python3 ./repro.py ^CTraceback (most recent call last): File "/home/ubuntu/./repro.py", line 6, in <module> r = RequestsUrlReader(url) File "/usr/lib/python3/dist-packages/simplestreams/contentsource.py", line 381, in __init__ self.req = requests.get(url, stream=True, auth=auth, headers=headers) File "/usr/lib/python3/dist-packages/requests/api.py", line 76, in get return request('get', url, params=params, **kwargs) File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send r = adapter.send(request, **kwargs) File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 382, in _make_request self._validate_conn(conn) File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1012, in _validate_conn conn.connect() File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 411, in connect self.sock = ssl_wrap_socket( File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 428, in ssl_wrap_socket ssl_sock = _ssl_wrap_socket_impl( File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 472, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "/usr/lib/python3.9/ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "/usr/lib/python3.9/ssl.py", line 1040, in _create self.do_handshake() File "/usr/lib/python3.9/ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() KeyboardInterrupt # Now upgrade and demonstrate the problem is fixed ubuntu@avoton02:~$ sudo apt install python3-simplestreams simplestreams -y Reading package lists... Done Building dependency tree... Done Reading state information... Done The following packages will be upgraded: python3-simplestreams simplestreams 2 upgraded, 0 newly installed, 0 to remove and 68 not upgraded. Need to get 31.8 kB/37.8 kB of archives. After this operation, 0 B of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu hirsute-proposed/main amd64 python3-simplestreams all 0.1.0-30-g3cc8988a-0ubuntu1.21.04.1 [31.8 kB] Fetched 31.8 kB in 0s (119 kB/s) (Reading database ... 79414 files and directories currently installed.) Preparing to unpack .../python3-simplestreams_0.1.0-30-g3cc8988a-0ubuntu1.21.04.1_all.deb ... Unpacking python3-simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) over (0.1.0-30-g3cc8988a-0ubuntu1) ... Preparing to unpack .../simplestreams_0.1.0-30-g3cc8988a-0ubuntu1.21.04.1_all.deb ... Unpacking simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) over (0.1.0-30-g3cc8988a-0ubuntu1) ... Setting up python3-simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) ... Setting up simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) ... Scanning processes... Scanning processor microcode... Scanning linux images... Running kernel seems to be up-to-date. The processor microcode seems to be up-to-date. No services need to be restarted. No containers need to be restarted. No user sessions are running outdated binaries. ubuntu@avoton02:~$ python3 ./repro.py & sleep 60 [1] 3605 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 382, in _make_request self._validate_conn(conn) File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1012, in _validate_conn conn.connect() File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 411, in connect self.sock = ssl_wrap_socket( File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 428, in ssl_wrap_socket ssl_sock = _ssl_wrap_socket_impl( File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 472, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "/usr/lib/python3.9/ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "/usr/lib/python3.9/ssl.py", line 1040, in _create self.do_handshake() File "/usr/lib/python3.9/ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() socket.timeout: _ssl.c:1106: The handshake operation timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 755, in urlopen retries = retries.increment( File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 531, in increment raise six.reraise(type(error), error, _stacktrace) File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise raise value File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 385, in _make_request self._raise_timeout(err=e, url=url, timeout_value=conn.timeout) File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 336, in _raise_timeout raise ReadTimeoutError( urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='images.maas.io', port=443): Read timed out. (read timeout=10) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/./repro.py", line 6, in <module> r = RequestsUrlReader(url) File "/usr/lib/python3/dist-packages/simplestreams/contentsource.py", line 382, in __init__ self.req = requests.get( File "/usr/lib/python3/dist-packages/requests/api.py", line 76, in get return request('get', url, params=params, **kwargs) File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send r = adapter.send(request, **kwargs) File "/usr/lib/python3/dist-packages/requests/adapters.py", line 529, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='images.maas.io', port=443): Read timed out. (read timeout=10) [1]+ Exit 1 python3 ./repro.py ** Description changed: [Impact] The bug is about simplestreams possibly getting stuck waiting forever for an an HTTP response that never comes, e.g. because of networking issues. This can potentially affect any package depending on simplestreams, but specifically it was reported affecting MAAS, where it causes server deployments to timeout. [Test Plan] + Install an iptables rule to block SSL handshaking w/ the MAAS simplestreams repo: - Ideally this should be tested by building a MAAS snap with the - simplestreams package including the fix, verifying that is works as - expected. + ------------------------- + $ sudo iptables -A INPUT -p tcp -s 91.189.88.136 -m string --string maas.io --algo bm -j DROP + ------------------------- + + Run the reproducer described below, and verify that it hangs + indefinitely (I recommend waiting 60s): + + ------------------------- + $ cat repro.py + #!/usr/bin/env python3 + + from simplestreams.contentsource import RequestsUrlReader + + url = "https://images.maas.io/ephemeral-v3/stable/streams/v1/index.sjson" + r = RequestsUrlReader(url) + ------------------------- + + With the fix applied, verify that it does timeout in ~10s. [Regression Potential] - Very little. Scenarios where it takes more than 10s for a remote server - to provide simplestreams with the data it requested are unlikely, but - can't be fully excluded. + Scenarios where it takes more than 10s to initiate a connection are + unlikely, but possible. Code that does not properly handle a timeout + exception in these situations may begin to fail. [Original Description] = How to determine you are seeing this problem = Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server? Get a list of pids of your regiond processes: $ ps -ef | grep regiond Run strace on each one to see if one is stuck in a connect() or recv() call: $ sudo strace -p $pid recv(... (normally you should see a lot of epoll_ctl() calls go by if not hung) If one is hung, use lsof to see what it is connected to: sudo lsof -i -a -p $pid If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover. ** Tags removed: verification-needed-hirsute ** Tags added: verification-done-hirsute -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1908452 Title: MAAS stops working and deployment fails after `Loading ephemeral` step To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1908452/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs