On 2/23/21 1:25 PM, Luke Sudbery wrote:
We have suddenly got bad performance from sreport, querying a 1 hour period (in the last 24 hours) for TopUsage went from taking under a minute to timing out after the 15 minutes max slurmdbd query time – although the SQL query on the DB server continued long after that.

So firstly we were wondering what might have caused that.

But while investigating we decided we should turn on purging records in slurmdbd.conf, and wanted more detail about when the purge would occur and would it lock the database for other Slurm processes. Docs say β€œThe purge takes place at the start of the each purge interval.” But we assume it will also do so on a restart of slurmdbd so we can manage exactly when that happens – is that true? And as we have many years and millions of records to purge we need to know if this will hang all database access, and what kind of outage that is likely to cause.

Anyone have experience of enabling urging after the fact?

I worked on progressive database purging a while back and documented it in my Slurm Wiki page:

https://wiki.fysik.dtu.dk/niflheim/Slurm_database#setting-database-purge-parameters

Note in particular these recommendations:

A monthly purge operation can be a huge amount of work for a database depending on its size, and you certainly want to cut down the amount of work required during the purges. If you did not use purges before, it is probably a good idea to try out a series of daily purges starting with:

PurgeEventAfter=2000days
PurgeJobAfter=2000days
PurgeResvAfter=2000days
PurgeStepAfter=2000days
PurgeSuspendAfter=2000days

If this works well over a few days, decrease the purge interval 2000days little by little and try again (1800, 1500, etc) until you after many iterations come down to the desired final purge intervals.

I hope this helps.

Best regards,
Ole

Reply via email to