I run into this problem occasionally. In my organization, most accounts are created with tcsh as the default shell, and then users copy my bash submission script example from my online documentation, or copy someone else's submission script written in bash. And then when the job runs, it fails with an error about the module command not being found.

The problem you are describing is because the module command is defined differently in bash and tcsh. In bash it's a function, but in tcsh it is an alias. Slurm jobs inherit the environment of the shell submitting the script, but when one of those shells is tcsh and the other is bash, or vice-versa, the definition of the command 'module' doesn't survive.

On RHEL-based systems, if your using the environment modules RPM,  the module command itself is defined in the the files /etc/profile.d/modules.{sh,csh}

One easy fix to this problem is that if someone is using tcsh but is using a bash submission script, they can make the interpreter of their bash submission script a login shell, which will process /etc/profile.d/*.sh by adding a -l to the interpreter line of their script:

#!/bin/bash -l

I imagine that this will work with someone using bash as their login shell, but writing their sbatch script in tcsh, but I've never come across that scenario.

Prentice

On 1/22/21 9:34 AM, Thomas M. Payerle wrote:
On our clusters, we typically find that an explicit source of the initialization dot files is need IF the default shell of the user submitting the job does _not_ match the shell being used to run the script.  I.e., for sundry historical and other reasons, the "default" login shell for users on our cluster is tcsh, so if an user with login shell of tcsh submits a bash job script, they generally
need to do an explicit "source ~/.profile".


On Fri, Jan 22, 2021 at 5:42 AM Gestió Servidors <sysadmin.c...@uab.cat <mailto:sysadmin.c...@uab.cat>> wrote:

    Hello,

    I use “Environment Modules” (http://modules.sourceforge.net/
    <http://modules.sourceforge.net/>) in my SLURM cluster. In my
    scripts I do need to add an explicit “source
    /soft/modules-3.2.10/Modules/3.2.10/init/bash”. However, in
    several examples I have read about SLURM scripts, nobody comments
    that. So, have I forgotten a parameter in SLURM to “capture”
    environment variables into the script or is it a problem due to my
    distribution (CentOS-7)???

    Thanks.



--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads paye...@umd.edu <mailto:paye...@umd.edu>
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831

--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov

Reply via email to