Nebula scheduling is based on first-in-first-out (with backfill so that shorter jobs that fit can run as long as they do not delay the start of jobs with higher priority), but with a few additions:
AssocGrpNodeLimit
until enough of that user’s jobs
have finished.QOSGrpNodeLimit
until enough long jobs have finished.There are 12 fat nodes with more memory (384 GiB). To use them, add -C fat
to
sbatch/interactive etc.
Node sharing is available on Nebula. The idea behind node sharing is
that you do not have to allocate a full compute node in order to run a
small job using, say, 1 or 2 cores. Thus, if you request a job like
sbatch -n 1 ...
the job may share the node with other jobs smaller
than 1 node. Jobs using a full node or more will not experience this
(e.g. we will not pack two 48 core jobs into 3 nodes). You can turn
off node-sharing for otherwise eligible jobs using the --exclusive
flag.
Warning: If you do not include -n
, -N
or --exclusive
to commands like sbatch
and interactive
, you will get
a single core, not a full node.
When you allocate less than a full node, you get a proportional share of the node’s memory. On a thin node with 96 GiB, that means that you get 1.5 GiB per allocated hyperthread which is the same as 3 GiB per allocated core.
If you need more memory you need to declare that using an option like
--mem-per-cpu=MEM
, where MEM is the memory in MiB per hyperthread
(even if you do not allocate your tasks on the hyperthread level).
Example: to run a process that needs approximately 32 GiB on one core,
you can use -n1 --mem-per-cpu=16000
. As you have not turned on
hyperthreading, you allocate a whole core, but the memory is still
specified per hyperthread.
As a comparison, -n2 --ntasks-per-core=2 --mem-per-cpu=16000
allocates two hyperthreads (on a core). Together, they will also
have approximately 32 GiB of memory to share.
Note: you cannot request a fat node on Nebula by passing a --mem
or
--mem-per-cpu
option too large for thin nodes. You need to use the
-C fat
option discussed above.
Each compute node has a local hard disk with approximately 210 GiB (on
thin nodes, 870 Gib on fat nodes) available for user files. The
environment variable $SNIC_TMP
in the job script environment points
to a writable directory on the local disk that you can use. A
difference on Nebula vs older systems is that each job has private
copies of the following directories used for temporary storage:
/scratch/local (`$SNIC_TMP`)
/tmp
/var/tmp
This means that one job cannot read files written by another job running on the same node. This applies even if it is two of your own jobs running on the same node!
Please note that anything stored on the local disk is deleted when your job ends. If some temporary or output files stored there needs to be preserved, copy them to project storage at the end of your job script.
Guides, documentation and FAQ.
Applying for projects and login accounts.