On Tue, 17 Dec 2024 16:48:04 +0200 Vincas Dargis <[email protected]> wrote:
> But in the end, net.ifnames=0 workaround helps to avoid "unavailable" state 
> at least.
> I guess that's kinda proves NetworkManager issue due to device renames? Rare 
> race condition?

I am having this issue on Debian Trixie using Network Manager with iwd and a 
TL-WN822N v2 (ath9k_htc) USB wireless adapter. After reading your report, I'm 
getting déjà vu of a similar issue long ago.

A long, long time ago, the free and open firmware for AR7010 and AR9271 USB 
wireless NICs from Qualcomm Atheros was introduced into Debian. It was an 
incredible achievement, but issues popped up from users of all sorts of 
graphical distros saying the adapter just wouldn't work right. The SSIDs could 
be listed by Network Manager, but just like I'm seeing today, the signal 
strength for all access points appears to be "null" and it's not possible to 
successfully join any of them. This stumped many, many people, until some 
genius found out that disabling "MAC address randomization" (a privacy feature 
to make up a MAC address on-the-fly and use it), somehow worked around the 
problem. This even helped users using a few other wireless USB chipsets 
(Realtek?) from about the same time period. Network Manager is oriented towards 
mobile and desktop users, so it would enable MAC randomization by default even 
when the kernel and wireless stack wouldn't otherwise.
A couple distros put together hacks to make this effective. If I recall 
correctly, I think Debian used a udev rule (in firmware-ath9k-htc or 
wpa_supplicant) to automagically recognize wireless adapters reported to be 
problematic and disable this setting for them. The mystery persisted, but we 
could be content with it.

Much later, some folks ran into this problem with a potentially new wireless 
chipset (Realtek?), and it was very odd. This person was probably a Linux 
kernel hacker—the vendor (Realtek?) was formally requested to investigate, 
presumably because this person was stumped and the closed-source firmware of 
this new chipset meant that help from some insiders was now called for. Lo and 
behold, geniuses cracked the mystery: it was an off-by-one error in the code 
path (wpa_supplicant or the kernel?) that was responsible for doing MAC address 
changes. Basically the function would make a copy of the string making the 
interface name, but if the interface name used the absolute maximum number of 
characters allowed for an interface name (15?), it'd prematurely truncate the 
string and cause all else to fail. Apparently very few devices would have such 
long interface names but, for whatever reason, these select chipsets were 
common culprits at the time.
Thus, a reasonable question would be if the interface having a very long name 
is causing iwd some trouble that would be hard to reproduce. However, I think 
this log on my machine is giving better clues:

Dec 26 17:44:10 penny NetworkManager[1011]: <info>  [1766789050.8301] manager: 
(wlan0): new 802.11 Wi-Fi device (/org/freedesktop/NetworkManager/Devices/9)
Dec 26 17:44:10 penny NetworkManager[1011]: <info>  [1766789050.8432] rfkill3: 
found Wi-Fi radio killswitch (at 
/sys/devices/pci0000:00/0000:00:14.0/usb3/3-1/3-1:1.0/ieee80211/phy2/rfkill3) 
(driver ath9k_htc)
Dec 26 17:44:11 penny NetworkManager[1011]: <info>  [1766789051.1473] manager: 
(wlan2): new 802.11 Wi-Fi device (/org/freedesktop/NetworkManager/Devices/10)
Dec 26 17:44:11 penny NetworkManager[1011]: <error> [1766789051.1633] 
iwd-manager[0x561e4eea8280]: if_nametoindex failed for Name wlan2 for Device at 
/net/connman/iwd/2/10: 19
Dec 26 17:44:11 penny NetworkManager[1011]: <info>  [1766789051.1635] device 
(wlan2): interface index 10 renamed iface from 'wlan2' to 'wlx90f652092824'

Do you see those last two lines? There is a race—on the order of less than a 
thousandth of a second—between the wireless interface being renamed away from 
wlan2, and iwd complaining about if_nametoindex() not working for that same 
name being removed.

I'm not knowledgeable to say what is renaming the interface (and whether it 
should be doing that), but indeed there's some missing coordination here. 
However, I think I found a workaround!

As the README.Debian states, iwd can be automagically started on-the-fly using 
D-Bus activation (as Network Manager likes to use it), or the service can just 
be enabled manually to always start on boot unconditionally. Running 'systemctl 
restart NetworkManager' on its own seemed to never help me, presumably because 
it would let iwd shut down, and thus both Network Manager and iwd would be back 
at the "starting line" to get into a race again. To give iwd a head start for 
just this boot, I tried this:
sudo systemctl --runtime enable iwd.service

(If you want this hack to *not* be temporary for this boot only, you may wish 
to omit the --runtime parameter and see how your luck fares. There's probably 
still a race condition but hopefully it'll be more deterministic now.)

After making sure that iwd stays alive in its own right (regardless of whether 
it's been solicited by Network Manager or not), now I restart Network Manager 
in the normal way:
sudo systemctl restart NetworkManager.service

And voilà! Now Network Manager is smart enough to show meaningful signal 
strength, join access points, and just work beautifully. If this issue is still 
present upstream, a way to reproduce can probably be made using mac80211_hwsim 
to spoof a wireless NIC.

Thanks for your report. I'll be keeping my eyes peeled for solutions

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to