No problem Massimiliano, many thanks for your assistance.

I had a lab of 25 identical machines running 22.04, these machines
seemed to be working with 6.8.0-52 or lower. It seemed the following
kernel update contained the e1000e driver update in the link above.

Over these 25 machines, I tried advice from various forums:

* Disabling TSO options via ethtools hasn't helped
* Disabling WOL via the BIOS and ethtools hasn't helped
* Disabling Active State Power Management (via grub pcie_aspm=off) hasn't helped

Unfortuniately, it only seemed to be resovled by moving to Network
Manager.

The network side is complaining about the DHCP negociation, there seems
to be something Network Manager is able to do that Netplan is struggling
with.

We might be a bit of a corner case here, given we are using a software
defined network and 802.1x. We have seem weirdness in other senses, VMs
not able to bridge virtual adapters. Machines not being accessible until
it calls out to the network.

What add's further confusion is that we have had a few machines that
appear to have the same OS/kernel/NIC firmware which appear to be ok.
There was another system that we upgraded to 24.04 with the latest
kernel which still seemed to have the issue.

I'll look at raising the apport-collect, is it best to run this after
the error has happened?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2115044

Title:
  Netplan and Intel e1000 Driver / I219-V Adapter

Status in linux package in Ubuntu:
  Incomplete
Status in netplan.io package in Ubuntu:
  Triaged

Bug description:
  Since the following change earlier in the year, I have been seeing
  issues with Intel's I219-V adapter. It's hard to say whether the
  problem is specifically Netplan or down to another change with the
  e1000 driver a month or so before. The main reason I am posting this
  here is because when machines were switched to Network Manager, the
  problem seemed to go away. (Replicated this work-around around with
  ~10 machines).

  
https://changelogs.ubuntu.com/changelogs/pool/main/n/netplan.io/netplan.io_0.106.1-7ubuntu0.22.04.4/changelog

    * SECURITY REGRESSION: failure on systems without dbus
      - debian/netplan.io.postinst: Don't call the generator if no networkd
        configuration file exists. (LP: #2071333)

  I have had quite a few machines on our network loosing networking
  after a period of time. The organisation's network is software defined
  (Cisco) and uses 802.1x to authenticate machines to various sub-nets.

  Machines get an IP address at boot but loose connection after 3-6
  hours. The syslog reports a constant stream of the following message.
  The Cisco logs seem to report "unable to obtain an IP address from
  DHCP". The machines seems to believe it still has the same IP address
  but is unable to communicate.

  [60689.477031] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                   TDH                  <13>
                   TDT                  <15>
                   next_to_use          <15>
                   next_to_clean        <11>
                 buffer_info[next_to_clean]:
                   time_stamp           <100e65bc0>
                   next_to_watch        <14>
                   jiffies              <100e65d58>
                   next_to_watch.status <0>
                 MAC Status             <40080083>
                 PHY Status             <796d>
                 PHY 1000BASE-T Status  <3800>
                 PHY Extended Status    <3000>
                 PCI Status             <10>

  There is a similiar bug reported on the following post although this
  e1000e driver seems to have quite phases where it fails for people. It
  seemed that it's a long lived NIC installed over many years, there are
  quite a few firmware versions to support over it's lifetime.

  https://bugzilla.kernel.org/show_bug.cgi?id=118721

  There was also a e1000e driver update via the Kernel package, this was
  around the same time range the Netplan changed.

  https://changelogs.ubuntu.com/changelogs/pool/main/l/linux-
  hwe-6.8/linux-hwe-6.8_6.8.0-60.63~22.04.1/changelog

  * Noble update: upstream stable patchset 2025-02-03 (LP: #2097301)
   - e1000e: change I219 (19) devices to ADP

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2115044/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to