Thank you for the detailed additional information. I think I understand the situation properly now, together with the need for the fix.
> That means that if something is currently accepting DHCP offers after the specified timeout has expired, then that is buggy behaviour. We sometimes run into the case where we're concerned about users being regressed by the very fix for the buggy behaviour itself. I think we need to take a pragmatic view in these cases. But unless there's some specific scenario we're concerned about, I think it's OK to not worry about the general case. As you say, if there's a user who is specifying a 2 second timeout but then relying on a DHCP reply coming after 3 seconds, then we will regress them but I think this is reasonable under the circumstances. I think it's sufficient to have considered and documented this case carefully, as you now have done (thanks!) We also need to be careful to ensure that the code that we ship in the archive isn't itself already making such an assumption. But I think you've sufficiently covered that already. It might be worth verifying that this is exactly the result of the change being landed here though. During SRU verification, in addition to the test plan you've already specified, would it be practical and/or useful to verify that, with a timeout of 2s, a DHCP reply sent after 1.5s works, but a DHCP reply sent after 2.5s does not? This would help us ensure that we are introducing the exact change we intend to introduce. I would also like you to verify your actual failing user story please - that your boot time does actually speed up with that hardware. With those two requests added to your Test Plan I am in principle fine to proceed. I still need to review your patch in detail but I've unfortunately run out of time for that today. I'd also like Łukasz's ack please. If another SRU team member looks again before I do, feel free to take over and accept without my further input if you feel that's appropriate. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to klibc in Ubuntu. https://bugs.launchpad.net/bugs/1947099 Title: ipconfig does not honour user-requested timeouts in some cases Status in klibc package in Ubuntu: New Status in klibc source package in Bionic: Incomplete Bug description: [Impact] In some cases, ipconfig can take a longer time than the user-specified timeouts, causing unexpected delays. [Test Plan] Any situation where ipconfig encounters an error sending the DHCP packet, it will automatically set a delay of 10 seconds, which could be longer than the user-specified timeout. It can be reproduced by creating a dummy interface and attempting to run ipconfig on it with a timeout value of less than 10: # ip link add eth1 type dummy # date; /usr/lib/klibc/bin/ipconfig -t 2 eth1; date Thu Nov 18 04:46:13 EST 2021 IP-Config: eth1 hardware address ae:e0:f5:9d:7e:00 mtu 1500 DHCP RARP IP-Config: no response after 2 secs - giving up Thu Nov 18 04:46:23 EST 2021 ^ Notice above, ipconfig thinks that it waited 2 seconds, but the timestamps show an actual delay of 10 seconds. [Where problems could occur] Please see reproduction steps above. We are seeing this in production too (see comment #2). [Other Info] A patch to fix the issue is being proposed here. It is a safe fix - it only checks before going into sleep that the timeout never exceeds the user-requested value. [Original Description] In some cases, ipconfig can take longer than the user-specified timeouts, causing unexpected delays. in main.c, in function loop(), the process can go into process_timeout_event() (or process_receive_event() ) and if it encounters an error situation, will set an attempt to "try again later" at time equal now + 10 seconds by setting s->expire = now + 10; This can happen at any time during the main event loop, which can end up extending the user-specified timeout if "now + 10" is greater than "start_time + user-specified-timeout". I believe a patch like the following is needed to avoid this problem: --- a/usr/kinit/ipconfig/main.c +++ b/usr/kinit/ipconfig/main.c @@ -437,6 +437,13 @@ static int loop(void) if (timeout > s->expire - now.tv_sec) timeout = s->expire - now.tv_sec; + + /* Compensate for already-lost time */ + gettimeofday(&now, NULL); + if (now.tv_sec + timeout > start + loop_timeout) { + timeout = loop_timeout - (now.tv_sec - start); + printf("Lowered timeout to match user request = (%d s) \n", timeout); + } } I believe the current behaviour is buggy. This is confirmed when the following line is executed: if (loop_timeout >= 0 && now.tv_sec - start >= loop_timeout) { printf("IP-Config: no response after %d " "secs - giving up\n", loop_timeout); rc = -1; goto bail; } 'loop_timeout' is the user-specified time-out. With a value of 2, in case of error, this line prints: IP-Config: no response after 2 secs - giving up So it thinks that it waited 2 seconds - however, in reality it had actually waited for 10 seconds. The suggested code-change ensures that the timeout that is actually used never exceeds the user-specified timeout. [ Regression potential ] This change ensures that user-specified timeouts are never exceeded, which is a problem that appears to happen only in case of interface errors. It may be that someone is relying on current behaviour where they receive DHCP offers after their specified timeout (but within the 10-second error timeout). However, 1) that is buggy behaviour and should be exposed. Such a user would need to update their specified timeout to make it long enough to receive the DHCP offer (setting the timeout to 10 would keep the existing behaviour). 2) I think it is unlikely that such a scenario exists at all. The 10-second timeout problem happens when there are problems with the interface that prevent it from even sending out the DHCP request. I think it is very unlikely (or even, impossible) that DHCP offers would be received on a dead interface. Based on the above points, I consider the regression potential to be very low for this change. I do not expect anyone who is currently using ipconfig successfully to notice this change. I believe the only difference introduced by this is the reduction of delays caused by dead or problematic network interfaces. Those error delays are shortened such that they never exceeed user-specified timeouts. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/klibc/+bug/1947099/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp