On Tue, 17 Dec 2024 16:48:04 +0200 Vincas Dargis <[email protected]> wrote: > But in the end, net.ifnames=0 workaround helps to avoid "unavailable" state > at least. > I guess that's kinda proves NetworkManager issue due to device renames? Rare > race condition?
I am having this issue on Debian Trixie using Network Manager with iwd and a TL-WN822N v2 (ath9k_htc) USB wireless adapter. After reading your report, I'm getting déjà vu of a similar issue long ago. A long, long time ago, the free and open firmware for AR7010 and AR9271 USB wireless NICs from Qualcomm Atheros was introduced into Debian. It was an incredible achievement, but issues popped up from users of all sorts of graphical distros saying the adapter just wouldn't work right. The SSIDs could be listed by Network Manager, but just like I'm seeing today, the signal strength for all access points appears to be "null" and it's not possible to successfully join any of them. This stumped many, many people, until some genius found out that disabling "MAC address randomization" (a privacy feature to make up a MAC address on-the-fly and use it), somehow worked around the problem. This even helped users using a few other wireless USB chipsets (Realtek?) from about the same time period. Network Manager is oriented towards mobile and desktop users, so it would enable MAC randomization by default even when the kernel and wireless stack wouldn't otherwise. A couple distros put together hacks to make this effective. If I recall correctly, I think Debian used a udev rule (in firmware-ath9k-htc or wpa_supplicant) to automagically recognize wireless adapters reported to be problematic and disable this setting for them. The mystery persisted, but we could be content with it. Much later, some folks ran into this problem with a potentially new wireless chipset (Realtek?), and it was very odd. This person was probably a Linux kernel hacker—the vendor (Realtek?) was formally requested to investigate, presumably because this person was stumped and the closed-source firmware of this new chipset meant that help from some insiders was now called for. Lo and behold, geniuses cracked the mystery: it was an off-by-one error in the code path (wpa_supplicant or the kernel?) that was responsible for doing MAC address changes. Basically the function would make a copy of the string making the interface name, but if the interface name used the absolute maximum number of characters allowed for an interface name (15?), it'd prematurely truncate the string and cause all else to fail. Apparently very few devices would have such long interface names but, for whatever reason, these select chipsets were common culprits at the time. Thus, a reasonable question would be if the interface having a very long name is causing iwd some trouble that would be hard to reproduce. However, I think this log on my machine is giving better clues: Dec 26 17:44:10 penny NetworkManager[1011]: <info> [1766789050.8301] manager: (wlan0): new 802.11 Wi-Fi device (/org/freedesktop/NetworkManager/Devices/9) Dec 26 17:44:10 penny NetworkManager[1011]: <info> [1766789050.8432] rfkill3: found Wi-Fi radio killswitch (at /sys/devices/pci0000:00/0000:00:14.0/usb3/3-1/3-1:1.0/ieee80211/phy2/rfkill3) (driver ath9k_htc) Dec 26 17:44:11 penny NetworkManager[1011]: <info> [1766789051.1473] manager: (wlan2): new 802.11 Wi-Fi device (/org/freedesktop/NetworkManager/Devices/10) Dec 26 17:44:11 penny NetworkManager[1011]: <error> [1766789051.1633] iwd-manager[0x561e4eea8280]: if_nametoindex failed for Name wlan2 for Device at /net/connman/iwd/2/10: 19 Dec 26 17:44:11 penny NetworkManager[1011]: <info> [1766789051.1635] device (wlan2): interface index 10 renamed iface from 'wlan2' to 'wlx90f652092824' Do you see those last two lines? There is a race—on the order of less than a thousandth of a second—between the wireless interface being renamed away from wlan2, and iwd complaining about if_nametoindex() not working for that same name being removed. I'm not knowledgeable to say what is renaming the interface (and whether it should be doing that), but indeed there's some missing coordination here. However, I think I found a workaround! As the README.Debian states, iwd can be automagically started on-the-fly using D-Bus activation (as Network Manager likes to use it), or the service can just be enabled manually to always start on boot unconditionally. Running 'systemctl restart NetworkManager' on its own seemed to never help me, presumably because it would let iwd shut down, and thus both Network Manager and iwd would be back at the "starting line" to get into a race again. To give iwd a head start for just this boot, I tried this: sudo systemctl --runtime enable iwd.service (If you want this hack to *not* be temporary for this boot only, you may wish to omit the --runtime parameter and see how your luck fares. There's probably still a race condition but hopefully it'll be more deterministic now.) After making sure that iwd stays alive in its own right (regardless of whether it's been solicited by Network Manager or not), now I restart Network Manager in the normal way: sudo systemctl restart NetworkManager.service And voilà! Now Network Manager is smart enough to show meaningful signal strength, join access points, and just work beautifully. If this issue is still present upstream, a way to reproduce can probably be made using mac80211_hwsim to spoof a wireless NIC. Thanks for your report. I'll be keeping my eyes peeled for solutions
signature.asc
Description: This is a digitally signed message part

