-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On Wed, 6 Mar 2019 18:15:41 +0100 Helmut Grohne <hel...@subdivi.de> wrote: > This suggests that iptables' ECN mask is wrong. It should be using > 0xfc rather than 0x3f.
Yes, I'm convinced the mask is wrong. However, fixing that would change the behaviour of already deployed firewalls. It's a thorny situation. Here's why I conclude the mask is wrong. Let's take RFC 1349[1], which added the Minimize-Cost bit to ToS. First off, realise that this conflicts with ECN, that's just part of the can of worms that is ToS, and it complicates a solution to this issue even more. RFC 1349 chapter 3 defines the Type of Service Octet as: The Type of Service octet consists of three fields: 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | | | | | PRECEDENCE | TOS | MBZ | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+ Note how the bit endianness is different than, for example, usual x86 diagrams. I think this is where the wrong mask stems from: the author of the wrong mask was accustomed to diagrams with a different bit endianness and ended up confused. Now let's look at chapter 4 which defines this TOS field from bits 3 through 6: 1000 -- minimize delay 0100 -- maximize throughput 0010 -- maximize reliability 0001 -- minimize monetary cost 0000 -- normal service And let's compare that to: # iptables -m tos --help iptables v1.6.0 [...] tos match options: [!] --tos value[/mask] Match Type of Service/Priority field value [!] --tos symbol Match TOS field (IPv4 only) by symbol Accepted symbolic names for value are: (0x10) 16 Minimize-Delay (0x08) 8 Maximize-Throughput (0x04) 4 Maximize-Reliability (0x02) 2 Minimize-Cost (0x00) 0 Normal-Service Take a good look at these hexadecimals corresponding to the symbolic names. They match the byte from RFC 1349 only if you flip the bit-endianness such that the least significant bit is on the right (Minimize-Cost has the lowest numerical value). Note that these hexadecimals are correct; it is only the mask that is wrong. This is on stretch/oldstable, but the help is no different on buster/stable. I'll continue with a stretch system, though. We can make it more concrete. Let's create an iptables rule with numerical values that matches DSCP CS6, which corresponds to IP Precendence 6, numerical value 0xC0, where in the terms of RFC 1349 bits 0 and 1 are set in the PRECEDENCE portion of the ToS octet. # iptables -I INPUT -m tos --tos 0xc0 -j NFLOG --nflog-group 2 Ping it: $ ping -Q 0xc0 -c 1 10.0.1.1 And take a look at the packet in the PCAP log file of that nflog, with Wireshark: Internet Protocol Version 4, Src: 10.0.1.133, Dst: 10.0.1.1 0100 .... = Version: 4 .... 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0xc0 (DSCP: CS6, ECN: Not-ECT) 1100 00.. = Differentiated Services Codepoint: Class Selector 6 (48) .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) Total Length: 84 Identification: 0x5029 (20521) Flags: 0x4000, Don't fragment Time to live: 64 Protocol: ICMP (1) Header checksum: 0xd33a [validation disabled] [Header checksum status: Unverified] Source: 10.0.1.133 Destination: 10.0.1.1 This proves: - - That -m tos --tos 0xc0 matches a packet that has 0xc0 in the DS Field (because this is the only rule in the firewall logging to that nflog group) - - That 0xc0 means DSCP CS6, because I believe Wireshark's analysis, it's been correct in different instances of looking at the DSCP field with packets generated by several systems. So that means that the mask 0x3f is a mistake. But changing the mask to 0xfc will make -m tos --tos Minimize-Cost break because that is actually one of the ECN bits. I tested it and --tos 0x02/0xfc predictably did not match ping -Q 0x02. Changing the mask to 0xff causes less breakage, but still changes the behaviour on existing deployments... :-( Perhaps the best solution is to deprecate the symbolic --tos arguments, urging everyone to exclusively use the numerical format. Put this in NEWS so people hopefully notice, perhaps the Release Notes. And then maybe someday DSCP and ECN will be less broken. Firewalls currently using symbolic --tos arguments already misqualify ECN and IP Precedence as well, it's not just DSCP. HTH, Peter. [1] <https://tools.ietf.org/html/rfc1349> - -- I use the GNU Privacy Guard (GnuPG) in combination with Enigmail. You can send me encrypted mail if you want some privacy. My key is available at <http://digitalbrains.com/2012/openpgp-key-peter> -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEZQCNwiCq4qJXTWzVlp4Bj95s3KEFAl1L3sYACgkQlp4Bj95s 3KFaAggAiffHqD+4Y7xTxyfXJbJSn0TN5cxd+42ffnbgkqlNKonzkxWEr0Hd9EOA ab9aDzSPGpjeFy1Hzuj9z0SrBUp30zWL3WqRxJTifxkIg9AXrWn9xuG1VgH4t1T+ HTpSrjn/Y5NsaUOBdzkWvumoIOY7NQqHxh2mLOPIg9AGJ4XVGbn5PEi2YHM2OVok 9QaB5wPMeNcVeUv4719vMFkZ+VoycremY7F23OMSBiY+vbQI6AntBOf3sJLDb2qQ Vjoh7EvR2F28+w+DyK77vqAwbMa/+VGgF+ld5BDCxv0w8h/Q434MdkcTJ8CefSNA 4cx1+SP6nSbOuPLirzhez1VANayTsA== =eTRc -----END PGP SIGNATURE-----