On Tue, 11.04.17 13:41, Samuel Williams ([email protected]) wrote:
> I had an accident last night. I tried to delete a lot of rows from a > production database in one transaction. I killed the transaction, and > I didn't realise it was still rolling back an hour later when I tried > to reboot the system for updates. > > I might be wrong about exactly who is doing what, but I'll do my best > to explain what happened, and then give a few notes about how I think > the process could be improved. I don't specifically know if it's > systemctl or the service file, or mysqld causing the issues. > > systemctl tried to shutdown mysqld, but due to the rollback, it > wouldn't exit with anything less than kill -9. So, systemctl entered > the failed state, however, in addition to that, it appears it closed > the log output of the mysqld process so I couldn't see any of the > output from the daemon. IIRC mysql actually uses its own log-file, and neither logs to stderr nor to syslog? Note that systemd don't actually close stdout/stderr or syslog at all, this is always done by the logging client, not the recipient (journald). > Eventually, after checking all our backups, I decided to issue kill -9 > to mysqld. I then decided to try restarting the daemon using > systemctl. It did start up, the log output showed the crash recovery > procedure, but because it entered into the rollback recovery, > systemctl never considered that the process had finished starting up, > and then tried to kill it again, which failed (only kill -9 would work > in this case). Again, log output was closed. Again, we really don't do that. Logging is actually pretty independent from service management when it comes to keeping the connections open: all systemd does is set up the connection, fork off the process and that's it. The forked processes can keep the connections open as long as they wish, or even pass them on to other processes if they like... That said, there's one long-standing race: if a process logs through syslog(), and very quickly after that exits we under some circumstances have trouble matching up the log output with the originating service. That's because the SCM_CREDENTIAL data we get for the logged messages might reference a PID which already disappeared from /proc/$PID which we hence cannot read the cgroup membership info from. Hence: maybe the logs are there, but the filtering didn't work the way you expect? > I issued kill -9 again, and started up the process manually `sudo -u > mysql mysqld` and let it go through the entire recovery process, and > then rebooted the system. Normalcy restored. > > I think that the following things could be investigated/improved: > > - Log output shouldn't be closed if the daemon can't be killed. I'm > not sure who is doing this - perhaps it's mysqld, or perhaps it's the > service file, or perhaps its systemctl? I doubt that it's systemd. > - If a daemon fails to start up, trying to kill it.. may not be the > best option. It's probably a matter of the systemctl service file > detecting that a rollback is in progress and accepting that as a valid > startup state, but I'm not really sure. In any case, I ended up having > to do this process manually. The timeouts for killing services that don't start up correctly are configurable per service, and by setting them to "infinity" you may even turn them off entirely. This is configurable precisely so that services that can take ages to start-up in real-life can increase or turn off the timeout the way they need. Or in other words: please make sure that your mysql.service carries the right StartTimeoutSec=/StopTimeoutSec= settings, and please contact your unit file vendor to fix this. Lennart -- Lennart Poettering, Red Hat _______________________________________________ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
