Also keep in mind that the default FairTree Algorithm changes in 19.05 to
FAIR_TREE. You’d have to set the PriorityFlag to NO_FAIR_TREE in order to
revert to the classic algorithm, otherwise your FairShare calculations will be
quite different even though the raw usage data remains the same.
Kin
Hello,
It’s certainly possible to check whether the job is interactive or not, e.g.
if job_desc.script == nil or job_desc.script == '' then
slurm.log_info("slurm_job_submit: jobscript is missing, assuming interactive
job")
else
slurm.log_info("slurm_job_submit: jobscript is present, assumi
Hi,
I’ve been tinkering with the acct_gather_profile/influxdb plugin a bit in
order to visualize the cpu and memory usage of live jobs.
Both the influxdb backend and Grafana dashboards seem like a perfect fit for
our needs.
I’ve run into an issue though and made a crude workaround for it, may
Hello Uwe,
when the requested time limit of a job runs out the job is cancelled and
terminated with signal SIGTERM (15) and later on SIGKILL (9) if that should
fail, the job gets the state „TIMEOUT“.
However the job 161 gets killed immediately by SIGKILL and gets the state
„FAILED“. That sugges
Hi Florian,
You can use the FirstJobId option from slurm.conf to continue the JobIds
seamlessly.
Kind Regards,
Lech
> Am 18.10.2019 um 11:47 schrieb Florian Zillner :
>
> Hi all,
>
> we’re using OpenHPC packages to run SLURM. Current OpenHPC Version is 1.3.8
> (SLURM 18.08.8), though we’re
Hello Florian,
unless the proposed order of job execution needs to be adhered to at all times,
it might be easier and fairer to use the fairshare mechanism.
As the name suggests, it was created to provide each user (or account) with a
fair share of ressources. It regards previous computation tim
> Am 03.07.2019 um 19:31 schrieb Chris Samuel :
>
> On 3/7/19 8:17 am, Lech Nieroda wrote:
>
>> Is that the expected behaviour or a bug?
>
> I'm not seeing that here with 18.08.7 and salloc, I'm only seeing:
>
> SLURM_NTASKS=5
>
> that'
Hi all,
there seems to be a discrepancy in the SLURM_NTASKS values depending on the
job type.
For example, let’s say the job requests 5 tasks (-n 5), is submitted with
sbatch, then its job step uses only 1 task (e.g. srun -n 1). In that case
you’ll see following values (with every launcher):
n touch with how everything goes!
>
> Best,
> Chris
> —
> Christopher Coffey
> High-Performance Computing
> Northern Arizona University
> 928-523-1167
>
>
> On 6/14/19, 2:22 AM, "slurm-users on behalf of Lech Nieroda"
> lech.nier...@uni-koeln.de> w
Hello Chris,
we’ve tried out your archiver and adapted it to our needs, it works quite well.
The changes:
- we get lots of jobs per day, ca. 3k-5k, so storing them as individual files
would waste too much inodes and 4k-blocks. Instead everything is written into
two log files (job_script.log and
Hi,if you want to affect priority, you can create additional partitions that contain nodes of a certain type, like bigmem, ibnet, etc. and set a priority boost of your choosing. Jobs that require certain features or exceed predefined thresholds can be then filtered and assigned to the appropriate p
Hello Julien,
the innodb engine may stop working if you change parameters such as
innodb_log_file_size without rebuilding the database, as the expected values no
longer correspond to the encountered ones. Try using the old parameters.
In order to debug the archive dump error you might want to ru
also help others.
>
> Best regards,
> Ole
>
> On 4/4/19 1:07 PM, Lech Nieroda wrote:
>> That’s correct but let’s keep in mind that it only concerns the upgrade
>> process and not production runtime which has certain implications.
>> The affected database structures ha
> Upgrading more than 2 releases isn't supported, so I don't believe the 19.05
> slurmdbd will have the code in it to upgrade tables from earlier than 17.11.
I haven’t found any mention of this in the upgrade section of the QuickStart
guide (see https://slurm.schedmd.com/quickstart_admin.html#up
rom what I gather from these discussion so far,
> SchedMD is basically saying we support Linux distro X, but not the
> MySQL/MariaDB version that comes with that distro. Is that a correct reading
> of this situation?
>
> --
> Prentice
>
> On 4/3/19 8:04 AM, Lech N
B versions 5.5 and newer.
>
> Best regards,
> Ole
>
>
> On 4/3/19 1:17 PM, Lech Nieroda wrote:
>> Hi Ole,
>>> Am 03.04.2019 um 12:53 schrieb Ole Holm Nielsen
>>> :
>>> SchedMD already decided that they won't fix the problem:
>> Yes, I g
hour and you don’t have tens of
millions of jobs then the optimizer has a problem and the patch would help you.
Kind regards,
Lech
>
> Best regards,
> Ole
>
> On 4/3/19 12:30 PM, Lech Nieroda wrote:
>> Hello Chris,
>> I’ve submitted the bug report together with a
gt;
> Would you be able to make patches against 18.08 and 19.05? If you submit the
> patches to SchedMD, my guess is that they'd be very interested. A site with
> a SchedMD support contract (such as our site) could also submit a bug report
> including your patch.
>
&g
:20 schrieb Chris Samuel :
>
> On Monday, 1 April 2019 7:55:09 AM PDT Lech Nieroda wrote:
>
>> Further analysis of the query has shown that the mysql optimizer has choosen
>> the wrong execution plan. This may depend on the mysql version, ours was
>> 5.1.69.
>
> I sus
We’ve run into exactly the same problem, i.e. an extremely long upgrade process
to the 17.11.x major release. Luckily, we’ve found a solution.
The first approach was to tune various innodb options, like increasing the
buffer pool size (8G), the log file size (64M) or the lock wait timeout (900)
d the cgroup.conf looks like this:
CgroupAutomount=yes
CgroupMountpoint=/cgroup
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
ConstrainKmemSpace=yes
AllowedSwapSpace=0
Kind regards,
Lech
--
Lech Nieroda
After some more digging this turns out to be the same issue as in Bug 4153 and
was fixed on September 27th 2017.
If you’ve upgraded to 17.02/17.11 prior to this date, be sure to check your
reqmem data.
> Am 26.01.2018 um 11:59 schrieb Lech Nieroda :
>
> Dear slurm users,
>
> w
he values ‚manually‘, i.e. made a query that selected all entries
with 2^31 <= mem_req < 2^63, made a backup, cleared the 2^31 bit, set the 2^63
bit, stored and checked the values.
Regards,
Lech
--
Dipl.-Wirt.-Inf. Lech Nieroda
Regionales Rechenzentrum der Universität zu Köln (RRZK)
23 matches
Mail list logo