On Fri, Aug 29, 2025 at 11:00:12AM +0200, Thomas Huth wrote:
> On 29/08/2025 10.31, Daniel P. Berrangé wrote:
> > We treat most HTTP errors as non-fatal when fetching assets,
> > but forgot to handle network level errors. This adds catching
> > of URLError so that we retry on failure, and will ultimately
> > trigger graceful skipping in the pre-cache task.
> > 
> > Signed-off-by: Daniel P. Berrangé <[email protected]>
> > ---
> >   tests/functional/qemu_test/asset.py | 9 ++++++++-
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/functional/qemu_test/asset.py 
> > b/tests/functional/qemu_test/asset.py
> > index ae2bec3ea5..36f64fe2f6 100644
> > --- a/tests/functional/qemu_test/asset.py
> > +++ b/tests/functional/qemu_test/asset.py
> > @@ -15,7 +15,7 @@
> >   from time import sleep
> >   from pathlib import Path
> >   from shutil import copyfileobj
> > -from urllib.error import HTTPError
> > +from urllib.error import HTTPError, URLError
> >   class AssetError(Exception):
> >       def __init__(self, asset, msg, transient=False):
> > @@ -171,6 +171,13 @@ def fetch(self):
> >                       raise AssetError(self, "Unable to download %s: "
> >                                        "HTTP error %d" % (self.url, e.code))
> >                   continue
> > +            except URLError as e:
> > +                # This is typically a network/service level error
> > +                # eg urlopen error [Errno 110] Connection timed out>
> > +                tmp_cache_file.unlink()
> > +                self.log.error("Unable to download %s: URL error %s",
> > +                               self.url, e)
> > +                continue
> 
> Hmm, I don't think we should retry on each and every URLError. For example
> if you have a typo in the server name, you get a "Name or service not known"
> URLError, and it does not make sense to retry in that case.
> 
> Also, in case of the server being down, it takes a minute or two 'til the
> urllib gives up, so if you retry multiple times in that case, you can easily
> extend the test time by > 5 minutes (as you can see by the timestamps in the
> example in your cover letter), which is unfortunate, too (considering that
> there is also a timeout setting for the gitlab CI jobs).
> 
> I think I'd maybe rather do something like this instead, without retrying
> the download, just marking certain errors as transient:
> 
>        except URLError as e:
>            tmp_cache_file.unlink()
>            msg = str(e.reason)
>            self.log.error("Unable to download %s: URL error %s",
>                           self.url, msg)
>            raise AssetError(self, msg,
>                             transient=("Network is unreachable" in msg))
> 
> WDYT?

Ok, yes, we can do that instead.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to