On Fri, Aug 29, 2025 at 11:00:12AM +0200, Thomas Huth wrote: > On 29/08/2025 10.31, Daniel P. Berrangé wrote: > > We treat most HTTP errors as non-fatal when fetching assets, > > but forgot to handle network level errors. This adds catching > > of URLError so that we retry on failure, and will ultimately > > trigger graceful skipping in the pre-cache task. > > > > Signed-off-by: Daniel P. Berrangé <[email protected]> > > --- > > tests/functional/qemu_test/asset.py | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/tests/functional/qemu_test/asset.py > > b/tests/functional/qemu_test/asset.py > > index ae2bec3ea5..36f64fe2f6 100644 > > --- a/tests/functional/qemu_test/asset.py > > +++ b/tests/functional/qemu_test/asset.py > > @@ -15,7 +15,7 @@ > > from time import sleep > > from pathlib import Path > > from shutil import copyfileobj > > -from urllib.error import HTTPError > > +from urllib.error import HTTPError, URLError > > class AssetError(Exception): > > def __init__(self, asset, msg, transient=False): > > @@ -171,6 +171,13 @@ def fetch(self): > > raise AssetError(self, "Unable to download %s: " > > "HTTP error %d" % (self.url, e.code)) > > continue > > + except URLError as e: > > + # This is typically a network/service level error > > + # eg urlopen error [Errno 110] Connection timed out> > > + tmp_cache_file.unlink() > > + self.log.error("Unable to download %s: URL error %s", > > + self.url, e) > > + continue > > Hmm, I don't think we should retry on each and every URLError. For example > if you have a typo in the server name, you get a "Name or service not known" > URLError, and it does not make sense to retry in that case. > > Also, in case of the server being down, it takes a minute or two 'til the > urllib gives up, so if you retry multiple times in that case, you can easily > extend the test time by > 5 minutes (as you can see by the timestamps in the > example in your cover letter), which is unfortunate, too (considering that > there is also a timeout setting for the gitlab CI jobs). > > I think I'd maybe rather do something like this instead, without retrying > the download, just marking certain errors as transient: > > except URLError as e: > tmp_cache_file.unlink() > msg = str(e.reason) > self.log.error("Unable to download %s: URL error %s", > self.url, msg) > raise AssetError(self, msg, > transient=("Network is unreachable" in msg)) > > WDYT?
Ok, yes, we can do that instead. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
