This page is dedicated to information collected by users of the Neumann HPC-Cluster.
For general information and news provided by Dr. Schulenburg, visit the Neumann-Cluster's homepage.
program | explanation |
---|---|
sbatch | Used to submit jobs with a job script e.g. sbatch -p big myjob.sh to run myjob.sh in the big -que |
squeue | Shows the current que |
scancel | Cancels waiting and running jobs e.g. scancel 32611 , cancel job with id 32611 |
srun | Used to run interactive and parallel jobs e.g. “srun -p short -N 1 --pty /bin/bash” to start an interactive job |
sinfo | Shows current status of ques |
module | Allows to load pre-configured settings for certain programs, such as compiler, mpi |
For advanced usage, have a look at SLURM - Tips & Tricks
list created automatically
-nice
option if you submit many individual jobs at once (man sbatch
).-tc
).#SBATCH -mem 120000
).short
queue is available.scratch
folder should be the working directory and contain most of the data. A lot of disk space and read/write capacity is available there. The home
folder, however, is reserved for quick-access configuration files.home
directory and should be submitted from there scp
, cp
, mv
,…)Sbatch Message Example:
$ sbatch script_name.sh sbatch: error: Batch script contains DOS line breaks (\r\n) sbatch: error: instead of expected UNIX line breaks (\n)
$ dos2unix script_name.sh
alternatively use
$ sed -i 's/^M//' script_name.sh
The character ^M
is a single special character. To type it press and holt CTRL
. Then, Press and release v
. Still holding CTRL
, press m
.
sbatch
might not be able to handle arbitrary formats.More information on this problem here.
Error Message Example:
c013.vc-a.217Received 643681 out-of-context eager message(s) from stray process PID=8216 running on host 172.16.0.14 (LID 0x61, ptype=0x1, subop=0x22, elapsed=35800.987s) (err=49)
c014
in one lineMore information on this problem here.
Error Message example:
c014.vc-a.170can't open /dev/ipath, network down (err=26) starccm+: Rank 0:170: MPI_Init: psm_ep_open() failed starccm+: Rank 0:170: MPI_Init: Can't initialize RDMA device starccm+: Rank 0:170: MPI_Init: Internal Error: Cannot initialize RDMA protocol
c014
in one lineMore information on this problem here.
mpirun: Warning one or more remote shell commands exited with non-zero status, which may indicate a remote access problem. error: Design STAR-CCM+ simulation completed Server process ended unexpectedly (return code 255) mpirun: Warning one or more remote shell commands exited with non-zero status, which may indicate a remote access problem.
mpirun: Warning one or more remote shell commands exited with non-zero status, which may indicate a remote access problem.
ssh node001
) but asks for your password.ssh-keygen
authorized_keys
: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Error Message Example:
/var/spool/slurmd/job11105/slurm_script: line 106: starccm+: command not found
Loaded Modules: starCCM/13.02.013 pwd=/scratch/tmp/
~
)starccm+
)starccm+
. If these fail, or you have a typo in the module load
command, then the environment is not properly setup with the information where the program is located.Error Message Example:
Checking whether servers are clean (of user processes) ERROR === On server node001 are user processes running: " 1 nscd 1 ntp" Finally Running on 0 processes on 0 servers.
Starting local server: /cluster/apps/starCCM/starccm+_12.02.011/STAR-CCM+12.02.011-R8/star/bin/starccm+ -licpath 1999@flex.cd-adapco.com -power -podkey XXXXXXXXXXXXXXXXXXXXXX -collab -np 0 -machinefile /scratch/tmp/seengel/machinefile.11102.txt -server -rsh /usr/bin/ssh /scratch/tmp/seengel/star.sim Error: Undefined slave list error: Design STAR-CCM+ simulation completed Server process ended unexpectedly (return code 1)
sed '/root/d;/munge/d;/dbus/d;/ldap/d;'