Hi Robie,

The user story here is about improving the time it takes to boot a
Bionic instance on Oracle Cloud in a specific bare metal shape, called
BM.GPU4.8. This is a pretty large instance, with 18x Ethernet controller
[0200]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
[15b3:1019]:

https://www.oracle.com/uk/cloud/compute/gpu.html

Comparing the time it takes to boot Bionic with a CentOS instance we can
see a big delta both in kernel and userspace:

CentOS:
Startup finished in 4min 3.548s (firmware) + 2min 23.525s (loader) + 3min 
17.984s (kernel) + 1min 37.825s (initrd) + 38.723s (userspace) = 12min 1.608s
multi-user.target reached after 38.714s in userspace

Ubuntu:
Startup finished in 4min 621ms (firmware) + 2min 23.174s (loader) + 7min 
34.767s (kernel) + 5min 48.219s (userspace) = 19min 46.782s
graphical.target reached after 5min 47.854s in userspace

The userspace difference is a cloud-init problem that is being handled
by other launchpad bugs. Here we are trying to focus on the kernel
difference, which further was narrowed down to be related to klibc and
it is what we are trying to address with this bug. Some details:

Looking into details, here are the dmesg output from CentOS and Ubuntu,
for kernel comparison:

Ubuntu: https://pastebin.canonical.com/p/X7GFVjdV3b/
CentOS: https://pastebin.canonical.com/p/gNhFwJjyjw/

Looking there, one area that looks promising is around mlx5_core driver,
where Ubuntu spends 218 seconds, while CentOS spends 44 seconds

Khalid did a nice job investigating and found that the problem came from
ipconfig from initramfs. Here are the details:


The delay from the kernel side is due to the fact that every interface (18 in 
total) takes over 10 seconds each to initialize.

The interfaces are brought up and used for DHCP as part of the iscsi
initialization, during the initramfs stage. Specifically the script
```/usr/share/initramfs-tools/scripts/local-top/iscsi``` which is
included in the initramfs and is installed as part of open-iscsi

To initialize the network interfaces this script calls other helper
functions that are part of the initramfs, specifically
"configure_networking" in ```/usr/share/initramfs-
tools/scripts/functions``` which is installed as part of initramfs-
tools-core.

The configure_networking function initializes networking by looping over
all interfaces and issuing this call:

```
ipconfig -t 2 <iface>
```

where it is calling the ipconfig utility (part of klibc-utils) which
also gets bundled in the initramfs. The -t 2 specifies a timeout of 2
seconds.


This ipconfig tool attempts to perform DHCP on the given interface, within the 
specified timeout (always 2 seconds for each interface)

Normally, ipconfig does not perform only one DHCP attempt - it performs
multiple DHCP attempts, each attempt waiting for an exponentially
increasing amount of time (1s, 2s, 4s, ..etc) up until the total amount
of time has passed that is equal the user-specified timeout (2 in this
case, so it should wait 1 second, followed by 1 second, for a total of 2
seconds).

However, it appears ipconfig has a bug. In case it encounters an error
sending a DHCP request on an interface (which is the case for 17 out of
the 18 interfaces on this instance), it attempts to "try again later"
and sets a timer of 10 seconds for the next attempt - ignoring
completely the user-specified value (2 seconds). I believe this is a bug
in ipconfig and have a working fix to limit the delay to the user-
specified timeout so each interface actually takes a maximum of 2
seconds for initialization, even in error cases.

The above is for bionic, which has the reported problem. In focal, even
though the kernel and klibc-utils are identical to bionic, the version
of initramfs-tools-core is different. In the focal version,
configure_networking() has been reworked to avoid using ipconfig
entirely. Instead, it uses ```dhclient```. And instead of doing that 18
times, dhclient is called only once with the full list of interfaces, as
such:

```
dhclient -v -1 -cf /etc/dhcp/dhclient.conf.2 -pf /tmp/dhclient.pid.2 -4 
enp12s0f0np0 enp12s0f1np1 enp138s0f0np0 enp138s0f1np1 enp148s0f0np0 
enp148s0f1np1 enp195s0f0np0 enp195s0f1np1 enp209s0f0np0 enp209s0f1np1 
enp22s0f0np0 enp22s0f1np1 enp45s0f0np0 enp45s0f1np1 enp72s0f0np0 enp72s0f1np1 
enp76s0f0np0 enp76s0f1np1
```

dhclient does DHCP on all those interfaces at the same time,
asynchronously, and it daemonizes (goes in the background) as soon as it
gets a DHCP response which happens relatively quickly (on interface
enp45s0f0np0 usually) and so the boot-up continues *much* faster
compared to bionic (180+ seconds are reduced to just 5-10 seconds).

I'm sure there might be other use cases, but this is what we have been
reported and are trying to work with. I hope this helps clarifying.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to klibc in Ubuntu.
https://bugs.launchpad.net/bugs/1947099

Title:
  ipconfig does not honour user-requested timeouts in some cases

Status in klibc package in Ubuntu:
  New
Status in klibc source package in Bionic:
  Incomplete

Bug description:
  
  [Impact]
  In some cases, ipconfig can take a longer time than the user-specified 
timeouts, causing unexpected delays.

  [Test Plan]
  Any situation where ipconfig encounters an error sending the DHCP packet, it 
will automatically set a delay of 10 seconds, which could be longer than the 
user-specified timeout. It can be reproduced by creating a dummy interface and 
attempting to run ipconfig on it with a timeout value of less than 10:

  # ip link add eth1 type dummy
  # date; /usr/lib/klibc/bin/ipconfig -t 2 eth1; date
  Thu Nov 18 04:46:13 EST 2021
  IP-Config: eth1 hardware address ae:e0:f5:9d:7e:00 mtu 1500 DHCP RARP
  IP-Config: no response after 2 secs - giving up
  Thu Nov 18 04:46:23 EST 2021

  ^ Notice above, ipconfig thinks that it waited 2 seconds, but the
  timestamps show an actual delay of 10 seconds.

  [Where problems could occur]
  Please see reproduction steps above. We are seeing this in production too 
(see comment #2).

  [Other Info]
  A patch to fix the issue is being proposed here. It is a safe fix - it only 
checks before going into sleep that the timeout never exceeds the 
user-requested value.

  [Original Description]

  In some cases, ipconfig can take longer than the user-specified
  timeouts, causing unexpected delays.

  in main.c, in function loop(), the process can go into
  process_timeout_event() (or process_receive_event() ) and if it
  encounters an error situation, will set an attempt to "try again
  later" at time equal now + 10 seconds by setting

  s->expire = now + 10;

  This can happen at any time during the main event loop, which can end
  up extending the user-specified timeout if "now + 10" is greater than
  "start_time + user-specified-timeout".

  I believe a patch like the following is needed to avoid this problem:

  --- a/usr/kinit/ipconfig/main.c
  +++ b/usr/kinit/ipconfig/main.c
  @@ -437,6 +437,13 @@ static int loop(void)

                          if (timeout > s->expire - now.tv_sec)
                                  timeout = s->expire - now.tv_sec;
  +
  +                       /* Compensate for already-lost time */
  +                       gettimeofday(&now, NULL);
  +                       if (now.tv_sec + timeout > start + loop_timeout) {
  +                               timeout = loop_timeout - (now.tv_sec - start);
  +                               printf("Lowered timeout to match user request 
= (%d s) \n", timeout);
  +                       }
                  }

  I believe the current behaviour is buggy. This is confirmed when the
  following line is executed:

                          if (loop_timeout >= 0 &&
                              now.tv_sec - start >= loop_timeout) {
                                  printf("IP-Config: no response after %d "
                                         "secs - giving up\n", loop_timeout);
                                  rc = -1;
                                  goto bail;
                          }

  'loop_timeout' is the user-specified time-out. With a value of 2, in
  case of error, this line prints:

  IP-Config: no response after 2 secs - giving up

  So it thinks that it waited 2 seconds - however, in reality it had
  actually waited for 10 seconds.

  The suggested code-change ensures that the timeout that is actually
  used never exceeds the user-specified timeout.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/klibc/+bug/1947099/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to