So to summarize my concerns: a) time-to-ssh will be impacted if there is one or more 503 responses from the metadata service. I guess the argument is that, without this service providing the instance details, we might not even be able to ssh in (because no keys were provisioned). Even so, if there is a script collecting this metric out there somewhere, and just checking if port 22 is reachable, it will just see this potential extra delay.
b) as long as the metadata service is returning 503 continuously, the boot will halt. There is no limit for the number of retries. Users waiting for such an instance will just see that it is never ready, and not understand why. cloud-init will be logging this for sure, but where would users see what is going on if the instance does not finish boot? Is this logging visible via the cloud's API somewhere in this scenario? c) lastly, I wonder in there is a risk of a regression that involves systems without metadata services. For example, somewhere where we don't really expect a metadata service to be around, but we were trying it anyway, and ignoring the 503 we might have gotten. Now we will wait forever on that, and I say forever because that "cloud" really wasn't supposed to provide a metadata service, and all was working fine because we were ignoring the error. Until now. Could this theoretical scenario exist? What about MAAS boots, or LXD? d) Regarding "his lack of try-again behavior causes problems in AWS where they expect cloud-init to retry when receiving a 503.", I see in this link[1] that aws recommends a retry in the case of the IMDSv2 endpoint: 503 – The request could not be completed. Retry the request. There is no indication or recommendation about how long or how many times it should be retried, only that frequent queries might be throttled. So I guess it's left as an implementation choice on our end. A potential infinite loop is always concerning. e) Do all clouds recommend the retry strategy when the metadata service returns a 503? And by "clouds" I really mean anywhere cloud-init is used to configure things during boot (lxd, maas, local VMs, real clouds, etc). 1. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata- data-retrieval.html#instance-metadata-returns -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2094858 Title: Cloud-init fails on AWS if IMDSv2 returns a 503 error. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2094858/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs