On Fri, 09 Aug 2019 at 02:04:25 +0200, Adam Borowski wrote: > But... if this ID must not be exposed on the network, why does it need to be > unique?
At the risk of stating the obvious, that defeats the object of having a unique ID: anything that stores per-machine state/configuration keyed by the machine ID will think all your machines and OS installations are the same. To use an analogy with other strings that might get used as a unique identifier, it's the same as if you set the hostname of all your machines to spartacus, or changed all your MAC addresses to d4:1d:8c:98:f0:0b, or somehow changed the hardware serial numbers and UUIDs in /sys/class/dmi/id to a constant value. The recommendation in machine-id(5) to use sd_id128_get_machine_app_specific() (which is implemented as HMAC with the app ID as the key), combined with the machine ID being unique, results in a family of stable per-machine identifiers that are unique to a machine and constant, but are not obviously related to each other. So for example, if Chromium and PulseAudio both used sd_id128_get_machine_app_specific(), an attacker who knows the Chromium app-specific machine ID would not be able to tell whether the PulseAudio app-specific machine ID belongs to the same machine or not. In particular, this is what systemd-networkd does for DHCP: it uses a keyed hash (HMAC) of the machine-id(5), so the same machine-id(5) gives you the same DHCP ID (and hence probably the same IP address), and different machine IDs give you different DHCP IDs, but the actual value of the machine ID is not sent in the DHCP transaction. For at least some code that does not follow the recommendation to use a HMAC, the reason is likely to be compatibility with data stored by older versions of itself: the code is older than the recommendation, and if it switched to using sd_id128_get_machine_app_specific() or equivalent now, it would lose its ability to associate stored state/configuration with the same machine for which it was stored, causing apparent data loss. I think it's also important to distinguish between the machine ID being exposed on the network in a way that can be seen by untrusted eavesdroppers (like if it was used for DHCP without using a HMAC), and being exposed to other parties in a way that already involves trusting them (like sharing it with the other machines sharing an NFS-mounted home directory alongside confidential personal files, or including /etc/machine-id in a system backup alongside stored secrets elsewhere in /etc). If an attacker can read (or write!) your home directory or your backups, that attacker being able to "fingerprint" your machine ID is the least of your concerns. Some mental models for the machine ID that are reasonably close: - it's like the hostname (except opaque, so users don't want to change it to a more aesthetically appealing value and then expect things to still work the same) - it's the same as the MAC address, if all machines had exactly one network interface (which they don't, so the MAC address is unsuitable) - it's the same as the motherboard serial number, if all machines had one (which they don't, so this is unsuitable) - it's the same as the disk serial number, if all machines had exactly one disk (which they don't, so this is unsuitable) Regards, smcv