>>>>> Gosh. Can we also replace this BUG() into something less aggressive ? >>>> >>>> >>>> There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only >>>> for the 'default' TPACKET version spread all over af_packet, so probably >>>> makes sense to rather make all of them less aggressive. >>>> >>>> >>> >>> Very few consumers actually go looking in the kernel logs to see the >>> error-warnings and report them back here. >>> >>> This severity will get them to report the incident which in this case >>> got fixed?? >> >> But BUG_ONs in the datapath can cause outages in real production >> environments. This should not happen for recoverable failures. For >> users who cannot be bothered to check their logs, there is sysctl >> kernel.panic_on_warn. > > > Completely understand(and you should have failover to handle these > outages).
Not for correlated failures where all systems can hit the same path. This is especially dangerous when remote packets or untrusted local users can trigger a BUG-enabled path. > But then are you ok giving incorrect info to the > application? No, we should certainly signal an error. For instance, returning TP_STATUS_WRONG_FORMAT instead of TP_STATUS_AVAILABLE. > For this specific bug: it is so basic that you should hit this bug 1st > time everytime when you are adding support or porting a new header. > Correct? Agreed, but that is small consolation if an unprivileged user (say, in a namespace) finds out that it can trigger the codepath. But I agree that this particular BUG_ON is one of the easier to reason about.