On Sat, Feb 10, 2018 at 03:05:16PM +0100, Kai Krakow wrote: > Am Sat, 10 Feb 2018 14:23:34 +0100 schrieb Kai Krakow: > > > Am Sat, 10 Feb 2018 02:16:44 +0200 schrieb Uoti Urpala: > > > >> On Fri, 2018-02-09 at 12:41 +0100, Lennart Poettering wrote: > >>> This last log lines indicates journald wasn't scheduled for a long > >>> time which caused the watchdog to hit and journald was aborted. > >>> Consider increasing the watchdog timeout if your system is indeed that > >>> loaded and that's is supposed to be an OK thing... > >> > >> BTW I've seen the same behavior on a system with a single active > >> process that uses enough memory to trigger significant swap use. I > >> wonder if there has been a regression in the kernel causing misbehavior > >> when swapping? The problems aren't specific to journald - desktop > >> environment can totally freeze too etc. > > > > This problem seems to be there since kernel 4.9 which was a real pita in > > this regard. It's progressively becoming better since kernel 4.10. The > > kernel seems trying to prevent swapping at any cost since then, at least > > at the cost of much higher latency, and at the cost of pushing all cache > > out of RAM. > > > > The result is processes stuck for easily 30 seconds and more during > > memory pressure. Sometimes I see the kernel loudly complaining in dmesg > > about high wait times for allocating RAM, especially from the btrfs > > module. Thus, the biggest problem may be that kernel threads itself get > > stuck in memory allocations and are a victim of high latency. > > > > Currently I'm running my user session in a slice with max 80% RAM which > > seems to help. It helps not discarding all cache. I also put some > > potentially high memory users (regarding cache and/or resident mem) into > > slices with carefully selected memory limits (backup and maintenance > > services). Slices limited in such a way will start swapping before cache > > is discarded and everything works better again. Part of this problem may > > be that I have one process running which mmaps and locks 1G of memory > > (bees, a btrfs deduplicator). > > > > This system has 16G of RAM which is usually plenty but I use tmpfs to > > build packages in Gentoo, and while that worked wonderfully before 4.9, > > I have to be really careful now. The kernel happily throws away cache > > instead of swapping early. Setting vm.swappiness differently seems to > > have no perceivable effect. > > > > Software that uses mmap is the first latency victim of this new > > behavior. > > As such, also systemd-journald seems to be hit hard by this. > > > > After the system recovered from high memory pressure (which can take > > 10-15 minutes, resulting in a loadavg of 400+), it ends up with some > > gigabytes of inactive memory in the swap which it will only swap back in > > then during shutdown (which will also take some minutes then). > > > > The problem since 4.9 seems to be that the kernel tends to do swap > > storms instead of constantly swapping out memory at low rates during > > usage. The swap storms totally thrash the system. > > > > Before 4.9, the kernel had no such latency spikes under memory pressure. > > Swap would usually grew slowly over time, and the system felt sluggish > > one or another time but still usable wrt latency. I usually ended up > > with 5-8G of swap usage, and that was no problem. Now, swap only > > significantly grows during swap storms with an unusable system for many > > minutes, with latencies of 10+ seconds around twice per minute. > > > > I had no swap storm yet since the last boot, and swap usage is around > > 16M now. Before kernel 4.9, this would be much higher already. > > After some more research, I found that vm.watermark_scale_factor may be > the knob I am looking for. I'm going to watch behavior now with a higher > factor (default = 10, now 200). >
Have you reporteed this to the kernel maintainers? LKML? While this is interesting to read on systemd-devel, it's not right venue. What you describe sounds like a regression that probably should be improved upon. Also, out of curiosity, are you running dmcrypt in this scenario? If so, is swap on dmcrypt as well? Regards, Vito Caputo _______________________________________________ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
