On Monday, 24 March 2025 17:12:08 CET Christian Mack wrote:
> Am 21.03.25 um 16:16 schrieb Daniel Kollmer ([email protected]):
> > I currently run Version 5.11.2 (@65536ba7a376 202503180650) and we have
> > observed extreme peaks in CPU load and memory use. After a large number
> > of twekas (I have posted them here before) the problem persists with
> > Sogo now triggering the oomkiller. In the past we had found through SQL
> > queries that the extreme peak in memory use was caused by a calendar
> > which somehow had managed to have re-ocurring events going back to the
> > 17th century, we guessed it must have been an error in a user entry.
[...]
> We had that too.
> The cause was the database got very slow.
> Therefore the workers piled up in high access times, using up all space.
> We have a separate database server, which is not only used for SOGo.
> We fixed that slow database.
Absolutely agree: database speed is the key... :)
> While doing so, we prevented oomkiller from killing the sogo main
> process with:
> # echo -100 >/proc/$(cat /var/run/sogo/sogo.pid)/oom_score_adj
>
> This only prevents the oomkiller from killing the main process any other
> worker will eventually get killed.
This only partially holds true: if you do this after starting SOGo, only
the master process will get 'better' oom scores which is absolutely
desirable. But every new worker process spawned thereafter will get the
same oom score as the master process because those are settings that
child processes do inherit from their parents.
So, to make this actually useful in the long run, you'd need to take
care of that: either by running a periodic job fixing the scores or by
restarting sogo from time to time (or after every oom kill)...
all the best,
Adi