So to summarize my concerns:

a) time-to-ssh will be impacted if there is one or more 503 responses
from the metadata service. I guess the argument is that, without this
service providing the instance details, we might not even be able to ssh
in (because no keys were provisioned). Even so, if there is a script
collecting this metric out there somewhere, and just checking if port 22
is reachable, it will just see this potential extra delay.

b) as long as the metadata service is returning 503 continuously, the
boot will halt. There is no limit for the number of retries. Users
waiting for such an instance will just see that it is never ready, and
not understand why. cloud-init will be logging this for sure, but where
would users see what is going on if the instance does not finish boot?
Is this logging visible via the cloud's API somewhere in this scenario?

c) lastly, I wonder in there is a risk of a regression that involves
systems without metadata services. For example, somewhere where we don't
really expect a metadata service to be around, but we were trying it
anyway, and ignoring the 503 we might have gotten. Now we will wait
forever on that, and I say forever because that "cloud" really wasn't
supposed to provide a metadata service, and all was working fine because
we were ignoring the error. Until now. Could this theoretical scenario
exist? What about MAAS boots, or LXD?

d) Regarding "his lack of try-again behavior causes problems in AWS
where they expect cloud-init to retry when receiving a 503.", I see in
this link[1] that aws recommends a retry in the case of the IMDSv2
endpoint:

  503 – The request could not be completed. Retry the request.

There is no indication or recommendation about how long or how many
times it should be retried, only that frequent queries might be
throttled. So I guess it's left as an implementation choice on our end.
A potential infinite loop is always concerning.

e) Do all clouds recommend the retry strategy when the metadata service
returns a 503? And by "clouds" I really mean anywhere cloud-init is used
to configure things during boot (lxd, maas, local VMs, real clouds,
etc).


1. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-
data-retrieval.html#instance-metadata-returns

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2094858

Title:
  Cloud-init fails on AWS if IMDSv2 returns a 503 error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2094858/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to