On Fri, 1 May 2020 13:56:02 +0200 Jesper Dangaard Brouer wrote: > On Thu, 30 Apr 2020 12:45:49 -0700 > Jakub Kicinski <k...@kernel.org> wrote: > > > On Thu, 30 Apr 2020 13:42:22 +0200 Jesper Dangaard Brouer wrote: > > > Currently if the default qdisc setup/init fails, the device ends up with > > > qdisc "noop", which causes all TX packets to get dropped. > > > > > > With the introduction of sysctl net/core/default_qdisc it is possible > > > to change the default qdisc to be more advanced, which opens for the > > > possibility that Qdisc_ops->init() can fail. > > > > > > This patch detect these kind of failures, and choose to fallback to > > > qdisc "noqueue", which is so simple that its init call will not fail. > > > This allows the interface to continue functioning. > > > > > > V2: > > > As this also captures memory failures, which are transient, the > > > device is not kept in IFF_NO_QUEUE state. This allows the net_device > > > to retry to default qdisc assignment. > > > > > > Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> > > > > I have mixed feelings about this one, I wonder if I'm the only one. > > Seems like failure to allocate the default qdisc is pretty critical, > > the log message may be missed, especially in the boot time noise. > > > > I think a WARN_ON() is in order here, I'd personally just replace the > > netdev_info with a WARN_ON, without the fallback. > > It is good that we agree that failure to default qdisc is pretty > critical. I guess we disagree on whether (1) we keep network > functioning in a degraded state, (2) drop all packets on net_device > such that people notice. > > This change propose (1) keeping the box functioning. For me it was a > pretty bad experience, that when I pushed a new kernel over the network > to my embedded box, then I lost all network connectivity. I > fortunately had serial console access (as this was not an OpenWRT box > but a full devel board) so I could debug, but I could no-longer upgrade > the kernel. I clearly noticed, as the box was not operational, but I > guess most people would just give up at this point. (Imagine a small > OpenWRT box config setting default_qdisc to fq_codel, which brick the > box as it cannot allocate memory). > > I hope that people will notice this degrade state, when they start to > transfer data to the device. Because running 'noqueue' on a physical > device will result in net_crit_ratelimited() messages below: > > [86971.609318] Virtual device eth0 asks to queue packet! > [86971.622183] Virtual device eth0 asks to queue packet! > [86971.627510] Virtual device eth0 asks to queue packet!
Both ways have advantages, I guess. I don't feel strongly, but I do think that WARN_ON() is in order here.