On 27/06/2021 00:22, Etan Kissling wrote: > When using multiple dnsmasq instances Ubus only connects on one of them. > Since 3c93e8eb41952a9c91699386132d6fe83050e9be dnsmasq crashes instead. > This change avoids the crash, leading to a graceful retry + error log. > > Signed-off-by: Etan Kissling <[email protected]> > --- > src/dnsmasq.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/src/dnsmasq.c b/src/dnsmasq.c > index 04582da..2b4291b 100644 > --- a/src/dnsmasq.c > +++ b/src/dnsmasq.c > @@ -449,10 +449,8 @@ int main (int argc, char **argv) > if (option_bool(OPT_UBUS)) > #ifdef HAVE_UBUS > { > - char *err; > daemon->ubus = NULL; > - if ((err = ubus_init())) > - die(_("UBus error: %s"), err, EC_MISC); > + (void) ubus_init(); /* Logging not set up yet. */ > } > #else > die(_("UBus not available: set HAVE_UBUS in src/config.h"), NULL, > EC_BADCONF); >
The gen on my_syslog and die() are: 1) During startup, before logging is set up, it's not allowed to call my_syslog(). It may appear to work, but won't in all cases. The only errors allowed are fatal errors with die(). 2) The crucial points are at lines 627 and 700 in dnsmasq.c, before 627 you can die(). After 627 you can't die(), dnsmasq has returned success return code and forked. After 700 you can log stuff. Between 627 and 700 is in the "you are not expected to understand this" class. If something fails before 627 that you want to log, save the information in variables local to main() and call my_syslog() after line 700. My change made the ubus code work in the same way as DBus. It expects that ubus_init() will return a non-NULL error report if something unexpected and nasty happened. (maybe a configuration that can never work.) If the Ubus connection cannot be made, but that's expected to change then ubus_init() should return NULL, and leave daemon->ubus set to NULL. In that case ubus_init() will be called again, and can either succeed, leave daemon->ubus still as NULL (in which case it will be called again and again....) or return a fatal error, which can by now only be logged. ubus_init() will continue to be called each time through the event loop. Since the only way ubus_init() can return a non-NULL is if ubus_add_object() fails, are we saying that that is not a fatal error and a retry may work? Or is it the case that if it fails once, it will always fail, but you want dnsmasq to run without ubus anyway? If the former, then just don't return the error string when ubus_add_object() fails. If the later, we may need to extend the error-handling mechanism. In any case, I'm minded, for both ubus and dbus, to treat an error return from [ub]bus_init() _after_ dnsmasq has started as a reason to turn off the subsystem, and discontinue all calls to *_init(). The current code looks like an invitation to an endlessly spinning CPU and spammed log. Looking in src/ubus.c, there seems to be a mechanism to reconnect to the ubus, and if that fails, daemon->ubus can end up as NULL again, having been set up correctly beforehand, so that ubus_init() will start to get called again. That code path feels dodgy, and it would be nice to see what actually happens when it's run. Looking again, there's also a code path in check_ubus_listeners() that can delete the ubus connection and set daemon->ubus back to NULL. That will start calls to ubus_init() again. I wonder if that is correct? TL;DR If ubus_add_object() failures are soft, then don't return an error string from ubus_init() when they happen. If ubus_add_object() failures are hard, then the call to die() looks good to me, but if you want dnsmasq to continue without a ubus connection then we need to arrange that ubus_init() is not called repeatedly in that case. There are a couple of paths in the ubus code that set daemon->ubus back to NULL, and I'm not sure things will work sensibly if they are ever taken. Cheers, Simon. _______________________________________________ Dnsmasq-discuss mailing list [email protected] https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
