On 2/23/21 1:25 PM, Luke Sudbery wrote:
We have suddenly got bad performance from sreport, querying a 1 hour
period (in the last 24 hours) for TopUsage went from taking under a minute
to timing out after the 15 minutes max slurmdbd query time β although the
SQL query on the DB server continued long after that.
So firstly we were wondering what might have caused that.
But while investigating we decided we should turn on purging records in
slurmdbd.conf, and wanted more detail about when the purge would occur and
would it lock the database for other Slurm processes. Docs say βThe purge
takes place at the start of the each purge interval.β But we assume it
will also do so on a restart of slurmdbd so we can manage exactly when
that happens β is that true? And as we have many years and millions of
records to purge we need to know if this will hang all database access,
and what kind of outage that is likely to cause.
Anyone have experience of enabling urging after the fact?
I worked on progressive database purging a while back and documented it in
my Slurm Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#setting-database-purge-parameters
Note in particular these recommendations:
A monthly purge operation can be a huge amount of work for a database
depending on its size, and you certainly want to cut down the amount of
work required during the purges. If you did not use purges before, it is
probably a good idea to try out a series of daily purges starting with:
PurgeEventAfter=2000days
PurgeJobAfter=2000days
PurgeResvAfter=2000days
PurgeStepAfter=2000days
PurgeSuspendAfter=2000days
If this works well over a few days, decrease the purge interval 2000days
little by little and try again (1800, 1500, etc) until you after many
iterations come down to the desired final purge intervals.
I hope this helps.
Best regards,
Ole