Explain template better, Improve explanation structure, currently too unsorted.
Imagine a case where you want to run many similar StarCCM+ simulations.
In the case that you normally would have to created an equal amount of job-scripts, such as the single simulation template already provided.
You would end up having to use tools such as sed
or any other editor to replace the changing names or directories.
As you could guess, this might become tedious for a larger amount of simulations.
Another option would be the use of a specialized macro, which is able to run multiple sim
-files sequentially within one job.
You'll be able to find an explanation in the Steve Portal
SLURM also offers the possibility to run a sequence of similar jobs, a so-called Job Array.
A job array is instanced by the additional argument –array
followed by a list of numbers.
For example, you could run a sequence of 5 jobs by adding –array=1-5
.
This will make new environment variables available within a SLURM job.
The variable SLURM_ARRAY_TASK_ID
is the most useful one.
It distinguishes the individual runs within an job array.
For example the SLURM_ARRAY_TASK_ID
would return SLURM_ARRAY_TASK_ID=1
for the first job in the array, SLURM_ARRAY_TASK_ID=2
for the second job in the array, and so forth.
You have to use this variable to switch through your simulations.
For example, your could have star1.sim
, star2.sim
, star3.sim
,…
In such a case it is rather easy to use the SLURM_ARRAY_TASK_ID
.
You would need to set the variable SIMULATIONFILE
to SIMULATIONFILE=“star${SLURM_ARRAY_TASK_ID}.sim”
.
Then the SLURM variable would be replaced with the respective numbers.
But what if you have different directories? You'd need to incorporate the job id into the path, too.
This could be inconvenient when you want to keep the directory names which might contain information on the simulation setup, such as: coupled_SST_1mioMesh_steady/star.sim
.
For such a case, you could fill a text file with the simulation paths and use the line numbers as sequence to be used in the job array.
In the following script SLURM_ARRAY_TASK_ID
is used to read the n-th line of an ASCII file called simdirs.csv
.
For example, ID=3
would have to read the third line of simdirs.csv
.
The argument –array
can take a number of different formats.
For example:
--array=1-5 ## This will create a sequence of 1,2,3,4,5 --array=1,4,15 ## This will create a list of 1,4,15, especially good, if you need to rerun a few cases --array=1-123:3 ## This will create a sequence, with a step size of three, such as: 1,4,7,...
This is an example for simdirs.csv
. It containes the relative path to the sim-file.
steady/mesh1/starSteady.sim steady/mesh2/starSteady.sim transient/mesh1/starTrans1.sim transient/mesh2/starTrans2.sim
This script mainly is based on the single simulation template to be found here.
it is extended by the argument –array
TODO explain:
SIMULATIONFILE=$ROOTDIR/$(sed -n "${SLURM_ARRAY_TASK_ID}p" $ROOTDIR/simdirs.csv)
ROOTDIR
WORKDIR=${SIMULATIONFILE%/*}
Windows users: please make sure to convert the script with dos2unix
on the linux machine, and read the article on Linebreaks
#!/bin/bash ## Version 01/2020 ## by Sebastian Engel ## ## Runs an array of jobs = one submission for x jobs ## Tailored to run StarCCM+ simulations ## expects a simdirs.csv containing relative paths starting from ROOTDIR to locate the sim files, ## one line per task. ## The sequence of tasks (which lines in simdirs.csv shall be run) has to be created manually. ## #################### Job Settings ################################################################# #SBATCH -J myJobArray # Setting the display name for the submission #SBATCH -N 1 # Number of nodes to reserve, -N 2-5 for variable number of requested node count #SBATCH --ntasks-per-node 16 # typically 16, range: 1..16 (max 16 cores per node) #SBATCH -t 30:00 # set walltime in hours, format: hhh:mm:ss, days-hh, days-hhh:mm:ss #SBATCH -p short # Desired Partition #SBATCH --mem 100G # Requested Memory. Neumann gives priority bonus under 120G #SBATCH --signal=B:USR1@180 # Sends a signal 180 seconds before the end of the job to this script, # to write a stop file for StarCCM #SBATCH --array=1-4 # Sequence of task IDs #################### Simulation Settings ########################################################## ## Root directory for the job array to be run. No "/" at the end. ## It should be the directory where simdirs.csv is located. ROOTDIR=/scratch/tmp/myusername ## Simulation file, selected by TASK ID given by SLURM. SIMULATIONFILE=$ROOTDIR/$(sed -n "${SLURM_ARRAY_TASK_ID}p" $ROOTDIR/simdirs.csv) ## Work directory. Filterd from the sim file path WORKDIR=${SIMULATIONFILE%/*} ## Macro file. Must be located in WORKDIR. Leave empty if no macro is used. MACROFILE="macro.java" ## Personal POD key PERSONAL_PODKEY="XXXXXXX" ## Decide which version by commenting out the desired version. #module load starCCM/11.06.011 #module load starCCM/12.02.011 #module load starCCM/13.02.013 module load starCCM/14.04.013 ## Application. Can be kept constant if modules are used. APPLICATION="starccm+" ## Select which options you need. Leave only the required options uncommented. ## ## you are using a macro and a sim file #USROPT="$SIMULATIONFILE -batch $WORKDIR/$MACROFILE" ## you are using a macro and are creating a new sim file #USROPT="-new -batch $WORKDIR/$MACROFILE" ## you want to just run the simulation USROPT="$SIMULATIONFILE -batch run" #################### Printing some Debug Information ############################################## ## Debug information /cluster/apps/utils/bin/slurmProlog.sh #################### Signal Trap ################################################################## ## Catches signal from slurm to write an ABORT file in the WORKDIR. ## This ABORT file will satisfy the stop file criterion in StarCCM. ## Change ABORTFILENAME if you changed the stop file Criterion. ABORTFILENAME="ABORT" ## Location where Starccm is looking for the abort file ABORTFILELOCATION=$WORKDIR/$ABORTFILENAME # remove old abort file rm -rf $ABORTFILELOCATION # Signal handler write_abort_file() { echo "$(date +%Y-%m-%d_%H:%M:%S) The End-of-Job signal has been trapped." echo "Writing abort file..." touch $ABORTFILELOCATION } # Trapping signal handler echo "Trapping handler for End-of-Job signal" trap 'write_abort_file' USR1 #################### Preparing the Simulation ##################################################### ## creating machinefile MACHINEFILE="machinefile.$SLURM_JOBID.txt" scontrol show hostnames $SLURM_JOB_NODELIST > $WORKDIR/$MACHINEFILE ## Default options plus user options OPTIONS="$USROPT -mpi openmpi -licpath 1999@flex.cd-adapco.com -power -podkey $PERSONAL_PODKEY -collab -time -rsh /usr/bin/ssh" ## Let StarCCM+ wait for licenses on startup export STARWAIT=1 #################### Running the simulation ####################################################### ## Run application (StarCCM+) in background to allow signal trapping echo "$(date +%Y-%m-%d_%H:%M:%S) Now, running the simulation ...." ## Command to run application (StarCCM+) $APPLICATION $OPTIONS -np $SLURM_NPROCS -machinefile $WORKDIR/$MACHINEFILE > $SIMULATIONFILE.$SLURM_JOBID.output.log 2>&1 & wait ## Final time stamp echo "Simulation finalized at: $(date +%Y-%m-%d_%H:%M:%S_%s_%Z)" ## Waiting briefly, to give starccm server processes time to quit gracefully. sleep 120 ## Clean-Up /cluster/apps/utils/bin/slurmEpilog.sh echo "done."