Slurm
Marlowe uses SLURM, a job scheduling system, to run jobs.
Accounts
Each allocation is given a project ID. This project ID corresponds to a job account on Marlowe.
One of the requirements (for accounting purposes) is for each job to be credited to a job account. If you don’t add a valid account, you will see the following error message when submitting jobs:
srun: error: ACCOUNT ERROR: Did you remember to set your account?
srun: error: Please check the Marlowe SLURM docs for info on how to set a your project account properly
How do I add my project account to SBATCH/SRUN/SALLOC?
It’s simple! There are two ways you can do it, using -A
or --account=
. Both accomplish the same thing and will allow you run jobs!
All accounts start with marlowe-
and are followed by their project ID. So if your project ID was m223813
, your account would be marlowe-m223813
.
optional: enter your project ID below and click the Generate button to generate copy & paste commands with your project ID pre-filled
Here are some Examples:
SRUN:
SALLOC:
SBATCH:
Notice the -A in each of the examples. Without it, you will not be able to submit jobs
Why can’t I SSH directly into the compute node I have reserved?
Due to the underlying system architecture of the superpod, you cannot SSH into a compute node directly from a new terminal instance.
You do have an option to reconnect to a running job with the following steps:
Step 1: Allocate your resources with salloc
as mentioned above
Step 2: Run srun --jobid=<jobid> --pty bash
in another terminal. It will connect to your allocated resource and you will be able to work out of two terminal sessions now.
In addition to the above commands, you also have the ability to use tmux
and screen
on the compute nodes.
NOTE: You can only have a maximum of two terminal windows connected to a job at one time. One through salloc
and one through srun
. It’s currently recommended to allocate resources via salloc
if you want to use a shell. You cannot connect to an already running job with salloc
.
I use srun in an sbatch script, how can I connect to my job?
There are two options: Connecting via sattach
or replacing srun
with mpirun
.
The recommended option is to replace srun
with mpirun
. For the most part, they are completely interchangeable. After replacing srun
with mpirun
, you can follow the previous instructions starting from Step 2.
Note: you may need to run module load openmpi4/gcc/4.1.5
, or add it to your sbatch
script for mpirun
to work.
The second option is to use sattach
. To use sattach
, you will need to already have started a shell (using srun --pty bash
) in your job. sattach
replaces that shell instance entirely.
To connect via sattach
, run the following: sattach <jobid>.0
. This will attach to the pre-existing shell.
As sattach
requires a shell to already exist, it is recommended to move srun
outside of your sbatch
script and use mpirun
instead.