Hi Alex, Thank you for this report. To summarize: * this appears to be a bug in systemd, or maybe systemd-shim * the systemd init.d script handler is lying and corrupting systemd state
On Mon, 28 Sep 2015 14:26:00 +1300 Alex King wrote: > > For example, with squid running, add a nonsense line into the > configuration. Reload with "systemctl reload squid3". Now "systemctl > status squid3" shows: > > â squid3.service - LSB: Squid HTTP Proxy version 3.x > Loaded: loaded (/etc/init.d/squid3) > Active: active (exited) since Mon 2015-09-28 13:31:37 NZDT; 12min ago > Process: 25937 ExecReload=/etc/init.d/squid3 reload (code=exited, status=0/SUCCESS) systemd is lying. The init script contains this to exit with an error on squid.conf errors: res=`$DAEMON -k parse -f $CONFIG 2>&1 | grep -o "FATAL .*"` if test -n "$res"; then log_failure_msg "$res" exit 3 ... On most OS a shell script calling exit N with a non-0 value means failure. Apparently systemd is different. > > Sep 28 13:42:52 juliet (squid-1)[25955]: Bungled /etc/squid3/squid.conf line 658: acl nonsense nonsense nonsense > > and: > > echo $? > 0 > Which leaves is all wondering what process "$? is actually reporting about. I suspect its reporting the exit status of the systemctl binary, or possibly whatever tool was used to record the log_failure_msg error to syslog. Certainly not Squid or the init script which is producing non-0 values > > systemctl knows squid has exited, but it reports that it is active, which > might be correct for a one-shot process, but not for a daemon like squid. Squid is not just a daemon. Squid is a daemon manager. That is a critical detail that I will get to later... Still. The Squid master process is not even getting to the point of starting (or restarting) the squid process. The init script is exiting with a config file check before that. > > When squid has exited, systemctl should report it as inactive. Quite. Especially so since the error code is being presented. If it were a real exit 0 situation we could forgive them, To make matters even worse the init script packaged with Squid is explicitly and very carefully written with logics such that any running service is not affected by such errors. The existing service is left running with the old config while the errors are logged. systemd decides to do its own thing again here. And this is where Squid being a daemon manager bites back. It would not be so bad if systemd were using SIGHUP to properly inform the daemon that it needs to exit. Which would get relayed to the real Squid master process. But for unexplained reasons it just outright SIGKILL to just abort the netowrok services mid-flight. * client transactions and dropped on the spot * filesystem transactions are dropped on the spot The Squid master process (daemon manager) is corectly delivered an unexpected abort signal by the kernel. So it promptly restarts the daemon ... using the known bad config file. Which of course aborts due to the config error. You can probably see several "(squid3-1): exit 1" messages in your syslog from that. > I assume having a proper unit file for systemd would fix this. And failing > that, modificaiton of the init script might do so? Sadly no. As I mention above the init script is already doing the right thing AFAIK. Using a unit file just prevents us from being able to use the squid -k parse protection against bad configurations. And it would make the above nasty situation become the new norm, even if the current bug in systemd/systemd-shim is fixed. As for ansible, always use squid -k parse (or squid3 -k parse) to verify squid.conf before rolling it out. Or run "squid -k check" after touching the config as a means of doing both the parse check and signalling the running daemon/worker process when it parses successfully. Amos