On Thu, 30 Apr 2020 12:45:49 -0700 Jakub Kicinski <k...@kernel.org> wrote:
> On Thu, 30 Apr 2020 13:42:22 +0200 Jesper Dangaard Brouer wrote: > > Currently if the default qdisc setup/init fails, the device ends up with > > qdisc "noop", which causes all TX packets to get dropped. > > > > With the introduction of sysctl net/core/default_qdisc it is possible > > to change the default qdisc to be more advanced, which opens for the > > possibility that Qdisc_ops->init() can fail. > > > > This patch detect these kind of failures, and choose to fallback to > > qdisc "noqueue", which is so simple that its init call will not fail. > > This allows the interface to continue functioning. > > > > V2: > > As this also captures memory failures, which are transient, the > > device is not kept in IFF_NO_QUEUE state. This allows the net_device > > to retry to default qdisc assignment. > > > > Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> > > I have mixed feelings about this one, I wonder if I'm the only one. > Seems like failure to allocate the default qdisc is pretty critical, > the log message may be missed, especially in the boot time noise. > > I think a WARN_ON() is in order here, I'd personally just replace the > netdev_info with a WARN_ON, without the fallback. It is good that we agree that failure to default qdisc is pretty critical. I guess we disagree on whether (1) we keep network functioning in a degraded state, (2) drop all packets on net_device such that people notice. This change propose (1) keeping the box functioning. For me it was a pretty bad experience, that when I pushed a new kernel over the network to my embedded box, then I lost all network connectivity. I fortunately had serial console access (as this was not an OpenWRT box but a full devel board) so I could debug, but I could no-longer upgrade the kernel. I clearly noticed, as the box was not operational, but I guess most people would just give up at this point. (Imagine a small OpenWRT box config setting default_qdisc to fq_codel, which brick the box as it cannot allocate memory). I hope that people will notice this degrade state, when they start to transfer data to the device. Because running 'noqueue' on a physical device will result in net_crit_ratelimited() messages below: [86971.609318] Virtual device eth0 asks to queue packet! [86971.622183] Virtual device eth0 asks to queue packet! [86971.627510] Virtual device eth0 asks to queue packet! -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer