Hi David,
this patch adds a dormant flag to network devices, RFC2863 operstate derived
from these flags and possibility for userspace interaction. It allows drivers
to signal that a device is unusable for user traffic without disabling
queueing (and therefore the possibility for protocol establishment traffic to
flow) and a userspace supplicant (WPA, 802.1X) to mark a device unusable
without changes to the driver.
It is the result of our long discussion. However I must admit that it
represents what Jamal and I agreed on with compromises towards Krzysztof, but
Thomas and Krzysztof still disagree with some parts. Anyway I think it should
be applied.
Patch is against 2.6.14, applies also against 2.6.15-rc5 with some lines
offset.
Jamal:
> My only other comment was on array operstates; is there any harm
> in really adding the strings to name those states nulled right now?
Done. Also removed one ASSERT_RTNL() and report IF_OPER_DOWN via netlink when
device is admin down.
Signed-Off-By: Stefan Rompf <[EMAIL PROTECTED]>
diff -X dontdiff -uNrp linux-2.6.14/Documentation/networking/operstates.txt linux-2.6.14-rfc2863/Documentation/networking/operstates.txt
--- linux-2.6.14/Documentation/networking/operstates.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-rfc2863/Documentation/networking/operstates.txt 2005-12-05 18:14:49.000000000 +0100
@@ -0,0 +1,161 @@
+
+1. Introduction
+
+Linux distinguishes between administrative and operational state of an
+interface. Admininstrative state is the result of "ip link set dev
+<dev> up or down" and reflects whether the administrator wants to use
+the device for traffic.
+
+However, an interface is not usable just because the admin enabled it
+- ethernet requires to be plugged into the switch and, depending on
+a site's networking policy and configuration, an 802.1X authentication
+to be performed before user data can be transferred. Operational state
+shows the ability of an interface to transmit this user data.
+
+Thanks to 802.1X, userspace must be granted the possibility to
+influence operational state. To accommodate this, operational state is
+split into two parts: Two flags that can be set by the driver only, and
+a RFC2863 compatible state that is derived from these flags, a policy,
+and changeable from userspace under certain rules.
+
+
+2. Querying from userspace
+
+Both admin and operational state can be queried via the netlink
+operation RTM_GETLINK. It is also possible to subscribe to RTMGRP_LINK
+to be notified of updates. This is important for setting from userspace.
+
+These values contain interface state:
+
+ifinfomsg::if_flags & IFF_UP:
+ Interface is admin up
+ifinfomsg::if_flags & IFF_RUNNING:
+ Interface is in RFC2863 operational state UP or UNKNOWN. This is for
+ backward compatibility, routing daemons, dhcp clients can use this
+ flag to determine whether they should use the interface.
+ifinfomsg::if_flags & IFF_LOWER_UP:
+ Driver has signaled netif_carrier_on()
+ifinfomsg::if_flags & IFF_DORMANT:
+ Driver has signaled netif_dormant_on()
+
+These interface flags can also be queried without netlink using the
+SIOCGIFFLAGS ioctl.
+
+TLV IFLA_OPERSTATE
+
+contains RFC2863 state of the interface in numeric representation:
+
+IF_OPER_UNKNOWN (0):
+ Interface is in unknown state, neither driver nor userspace has set
+ operational state. Interface must be considered for user data as
+ setting operational state has not been implemented in every driver.
+IF_OPER_NOTPRESENT (1):
+ Unused in current kernel (notpresent interfaces normally disappear),
+ just a numerical placeholder.
+IF_OPER_DOWN (2):
+ Interface is unable to transfer data on L1, f.e. ethernet is not
+ plugged or interface is ADMIN down.
+IF_OPER_LOWERLAYERDOWN (3):
+ Interfaces stacked on an interface that is IF_OPER_DOWN show this
+ state (f.e. VLAN).
+IF_OPER_TESTING (4):
+ Unused in current kernel.
+IF_OPER_DORMANT (5):
+ Interface is L1 up, but waiting for an external event, f.e. for a
+ protocol to establish. (802.1X)
+IF_OPER_UP (6):
+ Interface is operational up and can be used.
+
+This TLV can also be queried via sysfs.
+
+TLV IFLA_LINKMODE
+
+contains link policy. This is needed for userspace interaction
+described below.
+
+This TLV can also be queried via sysfs.
+
+
+3. Kernel driver API
+
+Kernel drivers have access to two flags that map to IFF_LOWER_UP and
+IFF_DORMANT. These flags can be set from everywhere, even from
+interrupts. It is guaranteed that only the driver has write access,
+however, if different layers of the driver manipulate the same flag,
+the driver has to provide the synchronisation needed.
+
+__LINK_STATE_NOCARRIER, maps to !IFF_LOWER_UP:
+
+The driver uses netif_carrier_on() to clear and netif_carrier_off() to
+set this flag. On netif_carrier_off(), the scheduler stops sending
+packets. The name 'carrier' and the inversion are historical, think of
+it as lower layer.
+
+netif_carrier_ok() can be used to query that bit.
+
+__LINK_STATE_DORMANT, maps to IFF_DORMANT:
+
+Set by the driver to express that the device cannot yet be used
+because some driver controlled protocol establishment has to
+complete. Corresponding functions are netif_dormant_on() to set the
+flag, netif_dormant_off() to clear it and netif_dormant() to query.
+
+On device allocation, networking core sets the flags equivalent to
+netif_carrier_ok() and !netif_dormant().
+
+
+Whenever the driver CHANGES one of these flags, a workqueue event is
+scheduled to translate the flag combination to IFLA_OPERSTATE as
+follows:
+
+!netif_carrier_ok():
+ IF_OPER_LOWERLAYERDOWN if the interface is stacked, IF_OPER_DOWN
+ otherwise. Kernel can recognise stacked interfaces because their
+ ifindex != iflink.
+
+netif_carrier_ok() && netif_dormant():
+ IF_OPER_DORMANT
+
+netif_carrier_ok() && !netif_dormant():
+ IF_OPER_UP if userspace interaction is disabled. Otherwise
+ IF_OPER_DORMANT with the possibility for userspace to initiate the
+ IF_OPER_UP transition afterwards.
+
+
+4. Setting from userspace
+
+Applications have to use the netlink interface to influence the
+RFC2863 operational state of an interface. Setting IFLA_LINKMODE to 1
+via RTM_SETLINK instructs the kernel that an interface should go to
+IF_OPER_DORMANT instead of IF_OPER_UP when the combination
+netif_carrier_ok() && !netif_dormant() is set by the
+driver. Afterwards, the userspace application can set IFLA_OPERSTATE
+to IF_OPER_DORMANT or IF_OPER_UP as long as the driver does not set
+netif_carrier_off() or netif_dormant_on(). Changes made by userspace
+are multicasted on the netlink group RTMGRP_LINK.
+
+So basically a 802.1X supplicant interacts with the kernel like this:
+
+-subscribe to RTMGRP_LINK
+-set IFLA_LINKMODE to 1 via RTM_SETLINK
+-query RTM_GETLINK once to get initial state
+-if initial flags are not (IFF_LOWER_UP && !IFF_DORMANT), wait until
+ netlink multicast signals this state
+-do 802.1X, eventually abort if flags go down again
+-send RTM_SETLINK to set operstate to IF_OPER_UP if authentication
+ succeeds, IF_OPER_DORMANT otherwise
+-see how operstate and IFF_RUNNING is echoed via netlink multicast
+-set interface back to IF_OPER_DORMANT if 802.1X reauthentication
+ fails
+-restart if kernel changes IFF_LOWER_UP or IFF_DORMANT flag
+
+if supplicant goes down, bring back IFLA_LINKMODE to 0 and
+IFLA_OPERSTATE to a sane value.
+
+A routing daemon or dhcp client just needs to care for IFF_RUNNING or
+waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before
+considering the interface / querying a DHCP address.
+
+
+For technical questions and/or comments please e-mail to Stefan Rompf
+(stefan at loplof.de).
diff -X dontdiff -uNrp linux-2.6.14/include/linux/if.h linux-2.6.14-rfc2863/include/linux/if.h
--- linux-2.6.14/include/linux/if.h 2005-11-02 11:07:32.000000000 +0100
+++ linux-2.6.14-rfc2863/include/linux/if.h 2005-11-30 21:15:24.000000000 +0100
@@ -33,7 +33,7 @@
#define IFF_LOOPBACK 0x8 /* is a loopback net */
#define IFF_POINTOPOINT 0x10 /* interface is has p-p link */
#define IFF_NOTRAILERS 0x20 /* avoid use of trailers */
-#define IFF_RUNNING 0x40 /* interface running and carrier ok */
+#define IFF_RUNNING 0x40 /* interface RFC2863 OPER_UP */
#define IFF_NOARP 0x80 /* no ARP protocol */
#define IFF_PROMISC 0x100 /* receive all packets */
#define IFF_ALLMULTI 0x200 /* receive all multicast packets*/
@@ -43,12 +43,16 @@
#define IFF_MULTICAST 0x1000 /* Supports multicast */
-#define IFF_VOLATILE (IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_MASTER|IFF_SLAVE|IFF_RUNNING)
-
#define IFF_PORTSEL 0x2000 /* can set media type */
#define IFF_AUTOMEDIA 0x4000 /* auto media select active */
#define IFF_DYNAMIC 0x8000 /* dialup device with changing addresses*/
+#define IFF_LOWER_UP 0x10000 /* driver signals L1 up */
+#define IFF_DORMANT 0x20000 /* driver signals dormant */
+
+#define IFF_VOLATILE (IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|\
+ IFF_MASTER|IFF_SLAVE|IFF_RUNNING|IFF_LOWER_UP|IFF_DORMANT)
+
/* Private (from user) interface flags (netdevice->priv_flags). */
#define IFF_802_1Q_VLAN 0x1 /* 802.1Q VLAN device. */
#define IFF_EBRIDGE 0x2 /* Ethernet bridging device. */
@@ -80,6 +84,22 @@
#define IF_PROTO_FR_ETH_PVC 0x200B
#define IF_PROTO_RAW 0x200C /* RAW Socket */
+/* RFC 2863 operational status */
+enum {
+ IF_OPER_UNKNOWN,
+ IF_OPER_NOTPRESENT,
+ IF_OPER_DOWN,
+ IF_OPER_LOWERLAYERDOWN,
+ IF_OPER_TESTING,
+ IF_OPER_DORMANT,
+ IF_OPER_UP,
+};
+
+/* link modes */
+enum {
+ IF_LINK_MODE_DEFAULT,
+ IF_LINK_MODE_DORMANT, /* limit upward transition to dormant */
+};
/*
* Device mapping structure. I'd just gone off and designed a
diff -X dontdiff -uNrp linux-2.6.14/include/linux/netdevice.h linux-2.6.14-rfc2863/include/linux/netdevice.h
--- linux-2.6.14/include/linux/netdevice.h 2005-11-02 11:08:10.000000000 +0100
+++ linux-2.6.14-rfc2863/include/linux/netdevice.h 2005-11-30 21:01:43.000000000 +0100
@@ -230,7 +230,8 @@ enum netdev_state_t
__LINK_STATE_SCHED,
__LINK_STATE_NOCARRIER,
__LINK_STATE_RX_SCHED,
- __LINK_STATE_LINKWATCH_PENDING
+ __LINK_STATE_LINKWATCH_PENDING,
+ __LINK_STATE_DORMANT,
};
@@ -334,11 +335,14 @@ struct net_device
*/
- unsigned short flags; /* interface flags (a la BSD) */
+ unsigned int flags; /* interface flags (a la BSD) */
unsigned short gflags;
unsigned short priv_flags; /* Like 'flags' but invisible to userspace. */
unsigned short padded; /* How much padding added by alloc_netdev() */
+ unsigned char operstate; /* RFC2863 operstate */
+ unsigned char link_mode; /* mapping policy to operstate */
+
unsigned mtu; /* interface MTU value */
unsigned short type; /* interface hardware type */
unsigned short hard_header_len; /* hardware hdr length */
@@ -712,6 +716,10 @@ static inline void dev_put(struct net_de
/* Carrier loss detection, dial on demand. The functions netif_carrier_on
* and _off may be called from IRQ context, but it is caller
* who is responsible for serialization of these calls.
+ *
+ * The name carrier is inappropriate, these functions should really be
+ * called netif_lowerlayer_*() because they represent the state of any
+ * kind of lower layer not just hardware media.
*/
extern void linkwatch_fire_event(struct net_device *dev);
@@ -727,6 +735,29 @@ extern void netif_carrier_on(struct net_
extern void netif_carrier_off(struct net_device *dev);
+static inline void netif_dormant_on(struct net_device *dev)
+{
+ if (!test_and_set_bit(__LINK_STATE_DORMANT, &dev->state))
+ linkwatch_fire_event(dev);
+}
+
+static inline void netif_dormant_off(struct net_device *dev)
+{
+ if (test_and_clear_bit(__LINK_STATE_DORMANT, &dev->state))
+ linkwatch_fire_event(dev);
+}
+
+static inline int netif_dormant(const struct net_device *dev)
+{
+ return test_bit(__LINK_STATE_DORMANT, &dev->state);
+}
+
+
+static inline int netif_oper_up(const struct net_device *dev) {
+ return (dev->operstate == IF_OPER_UP ||
+ dev->operstate == IF_OPER_UNKNOWN /* backward compat */);
+}
+
/* Hot-plugging. */
static inline int netif_device_present(struct net_device *dev)
{
diff -X dontdiff -uNrp linux-2.6.14/include/linux/rtnetlink.h linux-2.6.14-rfc2863/include/linux/rtnetlink.h
--- linux-2.6.14/include/linux/rtnetlink.h 2005-11-02 11:08:11.000000000 +0100
+++ linux-2.6.14-rfc2863/include/linux/rtnetlink.h 2005-11-18 20:14:05.000000000 +0100
@@ -733,6 +733,8 @@ enum
#define IFLA_MAP IFLA_MAP
IFLA_WEIGHT,
#define IFLA_WEIGHT IFLA_WEIGHT
+ IFLA_OPERSTATE,
+ IFLA_LINKMODE,
__IFLA_MAX
};
diff -X dontdiff -uNrp linux-2.6.14/net/core/dev.c linux-2.6.14-rfc2863/net/core/dev.c
--- linux-2.6.14/net/core/dev.c 2005-11-06 17:35:22.000000000 +0100
+++ linux-2.6.14-rfc2863/net/core/dev.c 2005-11-30 21:15:50.000000000 +0100
@@ -2141,12 +2141,20 @@ unsigned dev_get_flags(const struct net_
flags = (dev->flags & ~(IFF_PROMISC |
IFF_ALLMULTI |
- IFF_RUNNING)) |
+ IFF_RUNNING |
+ IFF_LOWER_UP |
+ IFF_DORMANT)) |
(dev->gflags & (IFF_PROMISC |
IFF_ALLMULTI));
- if (netif_running(dev) && netif_carrier_ok(dev))
- flags |= IFF_RUNNING;
+ if (netif_running(dev)) {
+ if (netif_oper_up(dev))
+ flags |= IFF_RUNNING;
+ if (netif_carrier_ok(dev))
+ flags |= IFF_LOWER_UP;
+ if (netif_dormant(dev))
+ flags |= IFF_DORMANT;
+ }
return flags;
}
diff -X dontdiff -uNrp linux-2.6.14/net/core/link_watch.c linux-2.6.14-rfc2863/net/core/link_watch.c
--- linux-2.6.14/net/core/link_watch.c 2005-06-17 21:48:29.000000000 +0200
+++ linux-2.6.14-rfc2863/net/core/link_watch.c 2005-11-30 21:13:53.000000000 +0100
@@ -49,6 +49,34 @@ struct lw_event {
/* Avoid kmalloc() for most systems */
static struct lw_event singleevent;
+static inline unsigned char default_operstate(const struct net_device *dev) {
+ if (!netif_carrier_ok(dev))
+ return dev->ifindex!=dev->iflink?IF_OPER_LOWERLAYERDOWN:IF_OPER_DOWN;
+ if (netif_dormant(dev)) return IF_OPER_DORMANT;
+ return IF_OPER_UP;
+}
+
+
+static void rfc2863_policy(struct net_device *dev) {
+ unsigned char operstate = default_operstate(dev);
+
+ if (operstate == dev->operstate) return;
+
+ switch(dev->link_mode) {
+ case IF_LINK_MODE_DORMANT:
+ if (operstate == IF_OPER_UP) operstate = IF_OPER_DORMANT;
+ break;
+ case IF_LINK_MODE_DEFAULT:
+ default:
+ break;
+ }
+
+ write_lock_bh(&dev_base_lock);
+ dev->operstate = operstate;
+ write_unlock_bh(&dev_base_lock);
+}
+
+
/* Must be called with the rtnl semaphore held */
void linkwatch_run_queue(void)
{
@@ -74,6 +102,7 @@ void linkwatch_run_queue(void)
*/
clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
+ rfc2863_policy(dev);
if (dev->flags & IFF_UP) {
if (netif_carrier_ok(dev)) {
WARN_ON(dev->qdisc_sleeping == &noop_qdisc);
diff -X dontdiff -uNrp linux-2.6.14/net/core/net-sysfs.c linux-2.6.14-rfc2863/net/core/net-sysfs.c
--- linux-2.6.14/net/core/net-sysfs.c 2005-06-17 21:48:29.000000000 +0200
+++ linux-2.6.14-rfc2863/net/core/net-sysfs.c 2005-12-07 23:29:46.000000000 +0100
@@ -94,6 +94,7 @@ NETDEVICE_ATTR(iflink, fmt_dec);
NETDEVICE_ATTR(ifindex, fmt_dec);
NETDEVICE_ATTR(features, fmt_long_hex);
NETDEVICE_ATTR(type, fmt_dec);
+NETDEVICE_ATTR(link_mode, fmt_dec);
/* use same locking rules as GIFHWADDR ioctl's */
static ssize_t format_addr(char *buf, const unsigned char *addr, int len)
@@ -136,9 +137,44 @@ static ssize_t show_carrier(struct class
return -EINVAL;
}
+static ssize_t show_dormant(struct class_device *dev, char *buf)
+{
+ struct net_device *netdev = to_net_dev(dev);
+ if (netif_running(netdev)) {
+ return sprintf(buf, fmt_dec, !!netif_dormant(netdev));
+ }
+ return -EINVAL;
+}
+
+static const char *operstates[] = {
+ "unknown",
+ "notpresent", /* currently unused */
+ "down",
+ "lowerlayerdown",
+ "testing", /* currently unused */
+ "dormant",
+ "up"
+};
+
+static ssize_t show_operstate(struct class_device *dev, char *buf)
+{
+ const struct net_device *netdev = to_net_dev(dev);
+ unsigned char operstate;
+
+ read_lock(&dev_base_lock);
+ operstate = netdev->operstate;
+ if (!netif_running(netdev)) operstate = IF_OPER_DOWN;
+ read_unlock(&dev_base_lock);
+
+ if (operstate >= sizeof(operstates)) return -EINVAL; /* should not happen */
+ return sprintf(buf, "%s\n", operstates[operstate]);
+}
+
static CLASS_DEVICE_ATTR(address, S_IRUGO, show_address, NULL);
static CLASS_DEVICE_ATTR(broadcast, S_IRUGO, show_broadcast, NULL);
static CLASS_DEVICE_ATTR(carrier, S_IRUGO, show_carrier, NULL);
+static CLASS_DEVICE_ATTR(dormant, S_IRUGO, show_dormant, NULL);
+static CLASS_DEVICE_ATTR(operstate, S_IRUGO, show_operstate, NULL);
/* read-write attributes */
NETDEVICE_SHOW(mtu, fmt_dec);
@@ -212,9 +248,12 @@ static struct class_device_attribute *ne
&class_device_attr_flags,
&class_device_attr_weight,
&class_device_attr_type,
+ &class_device_attr_link_mode,
&class_device_attr_address,
&class_device_attr_broadcast,
&class_device_attr_carrier,
+ &class_device_attr_dormant,
+ &class_device_attr_operstate,
NULL
};
diff -X dontdiff -uNrp linux-2.6.14/net/core/rtnetlink.c linux-2.6.14-rfc2863/net/core/rtnetlink.c
--- linux-2.6.14/net/core/rtnetlink.c 2005-11-02 11:08:12.000000000 +0100
+++ linux-2.6.14-rfc2863/net/core/rtnetlink.c 2005-12-07 23:32:36.000000000 +0100
@@ -178,6 +178,31 @@ rtattr_failure:
}
+static void set_operstate(struct net_device *dev, unsigned char transition) {
+ unsigned char operstate = dev->operstate;
+
+ switch(transition) {
+ case IF_OPER_UP:
+ if ((operstate == IF_OPER_DORMANT ||
+ operstate == IF_OPER_UNKNOWN) &&
+ !netif_dormant(dev))
+ operstate = IF_OPER_UP;
+ break;
+ case IF_OPER_DORMANT:
+ if (operstate == IF_OPER_UP ||
+ operstate == IF_OPER_UNKNOWN)
+ operstate = IF_OPER_DORMANT;
+ break;
+ }
+
+ if (dev->operstate != operstate) {
+ write_lock_bh(&dev_base_lock);
+ dev->operstate = operstate;
+ write_unlock_bh(&dev_base_lock);
+ netdev_state_change(dev);
+ }
+}
+
static int rtnetlink_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
int type, u32 pid, u32 seq, u32 change,
unsigned int flags)
@@ -208,6 +233,13 @@ static int rtnetlink_fill_ifinfo(struct
}
if (1) {
+ u8 operstate = netif_running(dev)?dev->operstate:IF_OPER_DOWN;
+ u8 link_mode = dev->link_mode;
+ RTA_PUT(skb, IFLA_OPERSTATE, sizeof(operstate), &operstate);
+ RTA_PUT(skb, IFLA_LINKMODE, sizeof(link_mode), &link_mode);
+ }
+
+ if (1) {
struct rtnl_link_ifmap map = {
.mem_start = dev->mem_start,
.mem_end = dev->mem_end,
@@ -398,6 +430,22 @@ static int do_setlink(struct sk_buff *sk
dev->weight = *((u32 *) RTA_DATA(ida[IFLA_WEIGHT - 1]));
}
+ if (ida[IFLA_OPERSTATE - 1]) {
+ if (ida[IFLA_OPERSTATE - 1]->rta_len != RTA_LENGTH(sizeof(u8)))
+ goto out;
+
+ set_operstate(dev, *((u8 *) RTA_DATA(ida[IFLA_OPERSTATE - 1])));
+ }
+
+ if (ida[IFLA_LINKMODE - 1]) {
+ if (ida[IFLA_LINKMODE - 1]->rta_len != RTA_LENGTH(sizeof(u8)))
+ goto out;
+
+ write_lock_bh(&dev_base_lock);
+ dev->link_mode = *((u8 *) RTA_DATA(ida[IFLA_LINKMODE - 1]));
+ write_unlock_bh(&dev_base_lock);
+ }
+
if (ifm->ifi_index >= 0 && ida[IFLA_IFNAME - 1]) {
char ifname[IFNAMSIZ];