On 29/08/2025 10.31, Daniel P. Berrangé wrote:
We treat most HTTP errors as non-fatal when fetching assets,
but forgot to handle network level errors. This adds catching
of URLError so that we retry on failure, and will ultimately
trigger graceful skipping in the pre-cache task.

Signed-off-by: Daniel P. Berrangé <[email protected]>
---
  tests/functional/qemu_test/asset.py | 9 ++++++++-
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tests/functional/qemu_test/asset.py 
b/tests/functional/qemu_test/asset.py
index ae2bec3ea5..36f64fe2f6 100644
--- a/tests/functional/qemu_test/asset.py
+++ b/tests/functional/qemu_test/asset.py
@@ -15,7 +15,7 @@
  from time import sleep
  from pathlib import Path
  from shutil import copyfileobj
-from urllib.error import HTTPError
+from urllib.error import HTTPError, URLError
class AssetError(Exception):
      def __init__(self, asset, msg, transient=False):
@@ -171,6 +171,13 @@ def fetch(self):
                      raise AssetError(self, "Unable to download %s: "
                                       "HTTP error %d" % (self.url, e.code))
                  continue
+            except URLError as e:
+                # This is typically a network/service level error
+                # eg urlopen error [Errno 110] Connection timed out>
+                tmp_cache_file.unlink()
+                self.log.error("Unable to download %s: URL error %s",
+                               self.url, e)
+                continue

Hmm, I don't think we should retry on each and every URLError. For example if you have a typo in the server name, you get a "Name or service not known" URLError, and it does not make sense to retry in that case.

Also, in case of the server being down, it takes a minute or two 'til the urllib gives up, so if you retry multiple times in that case, you can easily extend the test time by > 5 minutes (as you can see by the timestamps in the example in your cover letter), which is unfortunate, too (considering that there is also a timeout setting for the gitlab CI jobs).

I think I'd maybe rather do something like this instead, without retrying the download, just marking certain errors as transient:

       except URLError as e:
           tmp_cache_file.unlink()
           msg = str(e.reason)
           self.log.error("Unable to download %s: URL error %s",
                          self.url, msg)
           raise AssetError(self, msg,
                            transient=("Network is unreachable" in msg))

WDYT?

 Thomas


Reply via email to