Re: [slurm-users] Fairshare tree after SLURM upgrade

2021-01-29 Thread Lech Nieroda
Also keep in mind that the default FairTree Algorithm changes in 19.05 to FAIR_TREE. You’d have to set the PriorityFlag to NO_FAIR_TREE in order to revert to the classic algorithm, otherwise your FairShare calculations will be quite different even though the raw usage data remains the same. Kin

Re: [slurm-users] submit_plugin.lua: distinguish between batch and interactive usage

2020-12-07 Thread Lech Nieroda
Hello, It’s certainly possible to check whether the job is interactive or not, e.g. if job_desc.script == nil or job_desc.script == '' then slurm.log_info("slurm_job_submit: jobscript is missing, assuming interactive job") else slurm.log_info("slurm_job_submit: jobscript is present, assumi

[slurm-users] Efficiency of the profile influxdb plugin for graphing live job stats

2019-12-13 Thread Lech Nieroda
Hi, I’ve been tinkering with the acct_gather_profile/influxdb plugin a bit in order to visualize the cpu and memory usage of live jobs. Both the influxdb backend and Grafana dashboards seem like a perfect fit for our needs. I’ve run into an issue though and made a crude workaround for it, may

Re: [slurm-users] Running job is canceled when starting a new job from queue

2019-10-28 Thread Lech Nieroda
Hello Uwe, when the requested time limit of a job runs out the job is cancelled and terminated with signal SIGTERM (15) and later on SIGKILL (9) if that should fail, the job gets the state „TIMEOUT“. However the job 161 gets killed immediately by SIGKILL and gets the state „FAILED“. That sugges

Re: [slurm-users] Upgrading SLURM from 17.02.7 to 18.08.8 - Job ID gets reset

2019-10-18 Thread Lech Nieroda
Hi Florian, You can use the FirstJobId option from slurm.conf to continue the JobIds seamlessly. Kind Regards, Lech > Am 18.10.2019 um 11:47 schrieb Florian Zillner : > > Hi all, > > we’re using OpenHPC packages to run SLURM. Current OpenHPC Version is 1.3.8 > (SLURM 18.08.8), though we’re

Re: [slurm-users] Holding back jobs over QOS limit

2019-08-21 Thread Lech Nieroda
Hello Florian, unless the proposed order of job execution needs to be adhered to at all times, it might be easier and fairer to use the fairshare mechanism. As the name suggests, it was created to provide each user (or account) with a fair share of ressources. It regards previous computation tim

Re: [slurm-users] SLURM_NTASKS values in interactive and batch jobs

2019-07-04 Thread Lech Nieroda
> Am 03.07.2019 um 19:31 schrieb Chris Samuel : > > On 3/7/19 8:17 am, Lech Nieroda wrote: > >> Is that the expected behaviour or a bug? > > I'm not seeing that here with 18.08.7 and salloc, I'm only seeing: > > SLURM_NTASKS=5 > > that'

[slurm-users] SLURM_NTASKS values in interactive and batch jobs

2019-07-03 Thread Lech Nieroda
Hi all, there seems to be a discrepancy in the SLURM_NTASKS values depending on the job type. For example, let’s say the job requests 5 tasks (-n 5), is submitted with sbatch, then its job step uses only 1 task (e.g. srun -n 1). In that case you’ll see following values (with every launcher):

Re: [slurm-users] Slurm Jobscript Archiver

2019-06-17 Thread Lech Nieroda
n touch with how everything goes! > > Best, > Chris > — > Christopher Coffey > High-Performance Computing > Northern Arizona University > 928-523-1167 > > > On 6/14/19, 2:22 AM, "slurm-users on behalf of Lech Nieroda" > lech.nier...@uni-koeln.de> w

Re: [slurm-users] Slurm Jobscript Archiver

2019-06-14 Thread Lech Nieroda
Hello Chris, we’ve tried out your archiver and adapted it to our needs, it works quite well. The changes: - we get lots of jobs per day, ca. 3k-5k, so storing them as individual files would waste too much inodes and 4k-blocks. Instead everything is written into two log files (job_script.log and

Re: [slurm-users] Increasing job priority based on resources requested.

2019-04-19 Thread Lech Nieroda
Hi,if you want to affect priority, you can create additional partitions that contain nodes of a certain type, like bigmem, ibnet, etc. and set a priority boost of your choosing. Jobs that require certain features or exceed predefined thresholds can be then filtered and assigned to the appropriate p

Re: [slurm-users] slurmdbd purge not working

2019-04-08 Thread Lech Nieroda
Hello Julien, the innodb engine may stop working if you change parameters such as innodb_log_file_size without rebuilding the database, as the expected values no longer correspond to the encountered ones. Try using the old parameters. In order to debug the archive dump error you might want to ru

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-05 Thread Lech Nieroda
also help others. > > Best regards, > Ole > > On 4/4/19 1:07 PM, Lech Nieroda wrote: >> That’s correct but let’s keep in mind that it only concerns the upgrade >> process and not production runtime which has certain implications. >> The affected database structures ha

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Lech Nieroda
> Upgrading more than 2 releases isn't supported, so I don't believe the 19.05 > slurmdbd will have the code in it to upgrade tables from earlier than 17.11. I haven’t found any mention of this in the upgrade section of the QuickStart guide (see https://slurm.schedmd.com/quickstart_admin.html#up

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-04 Thread Lech Nieroda
rom what I gather from these discussion so far, > SchedMD is basically saying we support Linux distro X, but not the > MySQL/MariaDB version that comes with that distro. Is that a correct reading > of this situation? > > -- > Prentice > > On 4/3/19 8:04 AM, Lech N

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-03 Thread Lech Nieroda
B versions 5.5 and newer. > > Best regards, > Ole > > > On 4/3/19 1:17 PM, Lech Nieroda wrote: >> Hi Ole, >>> Am 03.04.2019 um 12:53 schrieb Ole Holm Nielsen >>> : >>> SchedMD already decided that they won't fix the problem: >> Yes, I g

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-03 Thread Lech Nieroda
hour and you don’t have tens of millions of jobs then the optimizer has a problem and the patch would help you. Kind regards, Lech > > Best regards, > Ole > > On 4/3/19 12:30 PM, Lech Nieroda wrote: >> Hello Chris, >> I’ve submitted the bug report together with a

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-03 Thread Lech Nieroda
gt; > Would you be able to make patches against 18.08 and 19.05? If you submit the > patches to SchedMD, my guess is that they'd be very interested. A site with > a SchedMD support contract (such as our site) could also submit a bug report > including your patch. > &g

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-02 Thread Lech Nieroda
:20 schrieb Chris Samuel : > > On Monday, 1 April 2019 7:55:09 AM PDT Lech Nieroda wrote: > >> Further analysis of the query has shown that the mysql optimizer has choosen >> the wrong execution plan. This may depend on the mysql version, ours was >> 5.1.69. > > I sus

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2019-04-01 Thread Lech Nieroda
We’ve run into exactly the same problem, i.e. an extremely long upgrade process to the 17.11.x major release. Luckily, we’ve found a solution. The first approach was to tune various innodb options, like increasing the buffer pool size (8G), the log file size (64M) or the lock wait timeout (900)

[slurm-users] Cores shared between jobs even with OverSubscribe=NO with 17.02.6

2018-08-14 Thread Lech Nieroda
d the cgroup.conf looks like this: CgroupAutomount=yes CgroupMountpoint=/cgroup CgroupReleaseAgentDir="/etc/slurm/cgroup" ConstrainCores=yes ConstrainDevices=yes ConstrainRAMSpace=yes ConstrainSwapSpace=yes ConstrainKmemSpace=yes AllowedSwapSpace=0 Kind regards, Lech -- Lech Nieroda

Re: [slurm-users] missing/failed mem_req conversion when upgrading from 15.08.12 to 17.02.6

2018-01-26 Thread Lech Nieroda
After some more digging this turns out to be the same issue as in Bug 4153 and was fixed on September 27th 2017. If you’ve upgraded to 17.02/17.11 prior to this date, be sure to check your reqmem data. > Am 26.01.2018 um 11:59 schrieb Lech Nieroda : > > Dear slurm users, > > w

[slurm-users] missing/failed mem_req conversion when upgrading from 15.08.12 to 17.02.6

2018-01-26 Thread Lech Nieroda
he values ‚manually‘, i.e. made a query that selected all entries with 2^31 <= mem_req < 2^63, made a backup, cleared the 2^31 bit, set the 2^63 bit, stored and checked the values. Regards, Lech -- Dipl.-Wirt.-Inf. Lech Nieroda Regionales Rechenzentrum der Universität zu Köln (RRZK)