Properties | NSC Triolith | LUNARC Aurora |
---|---|---|
Number of cores per compute node | 16 | 20 |
Filesystem | GPFS | Lustre |
Read cache | static, permanent | dynamic |
Table1.Triolith vs Aurora. NSC Triolith has 16 cores per node while LUNARC Aurora has 20 cores per node. NSC Triolith is using the GPFS filesystem and Lunarc Aurora the Lustre filesystem. GPFS has a static (and permanent) read-cache size and on Triolith this is small, because it “removes” RAM from the compute node. In contrast Lustre has dynamic read-cache, so if the file you read fits into the RAM of the compute node and this RAM isn’t needed for the data processing, then the second time you read data from the file, you will effectively read it from RAM instead, which is much faster than reading from disk.
The different number of cores and filesystems at Aurora and Triolith has a few interesting consequences when running XDS and its derivatives XDSGUI, XDSAPP, autoPROC and XDSme.
For all XDS benchmark runs listed in Table 2-5 below we use 1.3Å EIGER_16M_Nov2015.tar.bz2 data from dectris (900 frames) as done before - see XDS benchmarking.
The XDS benchmark runs were done using now outdated version 1 of forkxds
that had a more strict SLURM allocation requirement
number of nodes
Xnumber of tasks per node
=MAXIMUM_NUMBER_OF_JOBS
number of nodes
Xnumber of tasks per node
Xnumber of cpu's per task
=MAXIMUM_NUMBER_OF_JOBS
XMAXIMUM_NUMBER_OF_PROCESSORS
The current version of forkxds is more forgiving with respect to SLURM allocation requirement
MAXIMUM_NUMBER_OF_JOBS
XMAXIMUM_NUMBER_OF_PROCESSORS
=number of nodes
Xcores per node
=total number of cores
The XDS benchmark below was done by first creating XDS.INP
module load generate_XDS.INP
generate_XDS.INP insu6_1_master.h5
then adding 2-3 lines to XDS.INP being
MAXIMUM_NUMBER_OF_JOBS=8
MAXIMUM_NUMBER_OF_PROCESSORS=10
and H5ToXds will be used in absence of dectris-neggia library
To use dectris-neggia library instead of H5ToXDS add
LIB=/sw/pkg/presto/software/Neggia/1.0.1-goolf-PReSTO-1.7.20/lib/dectris-neggia.so
to XDS.INP at LUNARC Aurora
or
LIB=/proj/xray/presto/software/Neggia/1.0.1-goolf-PReSTO-1.7.20/lib/dectris-neggia.so
to XDS.INP at NSC Triolith
then edit xds.script to match XDS.INP parameters above
#!/bin/sh
#SBATCH -t 0:15:00
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=10
#SBATCH -A snic2017-1-199
#SBATCH --mail-type=ALL
#SBATCH --mail-user=martin.moche@ki.se
module load XDS
xds_par
then run xds.script by
sbatch xds.script
Aurora with neggia | nodes | cores | JOBS | PROCESSORS | INIT | COLSPOT | INTEGRATE | TOTAL |
---|---|---|---|---|---|---|---|---|
–nodes=1 | 1 | 20 | 1 | 20 | 7.1 | 24.8 | 91.7 | 136 s |
–nodes=1 | 1 | 20 | 2 | 10 | 8.0 | 22.7 | 65.2 | 109 s |
–nodes=1 | 1 | 20 | 4 | 5 | 11.8 | 21.7 | 63.5 | 111 s |
–nodes=2 | 2 | 40 | 2 | 20 | 6.8 | 16.2 | 46.9 | 83 s |
–nodes=2 | 2 | 40 | 4 | 10 | 8.2 | 13.0 | 37.4 | 72 s |
–nodes=2 | 2 | 40 | 8 | 5 | 12.1 | 12.1 | 33.8 | 72 s |
–nodes=4 | 4 | 80 | 4 | 20 | 7.0 | 9.1 | 24.4 | 52 s |
–nodes=4 | 4 | 80 | 8 | 10 | 8.3 | 7.7 | 20.9 | 50 s |
–nodes=4 | 4 | 80 | 16 | 5 | 12.3 | 7.5 | 84.4 | 118 s |
–nodes=8 | 8 | 160 | 8 | 20 | 7.0 | 7.0 | 14.5 | 40 s |
–nodes=8 | 8 | 160 | 16 | 10 | 8.1 | 5.3 | 12.2 | 38 s |
Table2. At LUNARC Aurora with neggia library we obtained the shortest TOTAL wall-clock times - compare with Table3-5
Aurora with H5ToXDS | nodes | cores | JOBS | PROCESSORS | INIT | COLSPOT | INTEGRATE | TOTAL |
---|---|---|---|---|---|---|---|---|
–nodes=1 | 1 | 20 | 1 | 20 | 8.2 | 33.0 | 105.7 | 159 s |
–nodes=1 | 1 | 20 | 2 | 10 | 9.7 | 29.8 | 81.7 | 135 s |
–nodes=1 | 1 | 20 | 4 | 5 | 15.0 | 28.4 | 78.1 | 136 s |
–nodes=2 | 2 | 40 | 2 | 20 | 7.7 | 17.9 | 56.8 | 95 s |
–nodes=2 | 2 | 40 | 4 | 10 | 9.6 | 17.6 | 44.1 | 85 s |
–nodes=2 | 2 | 40 | 8 | 5 | 9.5 | 16.4 | 43.5 | 83 s |
–nodes=4 | 4 | 80 | 4 | 20 | 8.0 | 11.0 | 27.3 | 59 s |
–nodes=4 | 4 | 80 | 8 | 10 | 11.4 | 9.4 | 23.6 | 58 s |
–nodes=4 | 4 | 80 | 16 | 5 | 15.1 | 9.3 | 25.4 | 64 s |
Table3. At LUNARC Aurora with H5ToXDS the TOTAL wall-clock time is longer than when using neggia library - compare to Table 2
Triolit with neggia | nodes | cores | JOBS | PROCESSORS | INIT | COLSPOT | INTEGRATE | TOTAL |
---|---|---|---|---|---|---|---|---|
–nodes=1 | 1 | 16 | 1 | 16 | 29.8 | 41.3 | 306.9 | 393 s |
–nodes=1 | 1 | 16 | 2 | 8 | 36.3 | 44.2 | 229.5 | 325 s |
–nodes=1 | 1 | 16 | 4 | 4 | 31.2 | 50.0 | 120 | 220 s |
–nodes=2 | 2 | 32 | 2 | 16 | 23.4 | 29.5 | 253.2 | 321 s |
–nodes=2 | 2 | 32 | 4 | 8 | 33.3 | 37.8 | 110.3 | 197 s |
–nodes=2 | 2 | 32 | 8 | 4 | 22.8 | 26.9 | 58.1 | 125 s |
–nodes=4 | 4 | 64 | 4 | 16 | 24.6 | 38.3 | 123.6 | 201 s |
–nodes=4 | 4 | 64 | 8 | 8 | 21.2 | 23.2 | 59.2 | 119 s |
Table4. NSC Triolith with the Neggia library
Triolith & H5ToXDS | nodes | cores | JOBS | PROCESSORS | INIT | COLSPOT | INTEGRATE | TOTAL |
---|---|---|---|---|---|---|---|---|
–nodes=1 | 1 | 16 | 1 | 16 | 14.6 | 83.4 | 266.1 | 379 s |
–nodes=1 | 1 | 16 | 2 | 8 | 17.7 | 58.5 | 189.8 | 282 s |
–nodes=1 | 1 | 16 | 4 | 4 | 26.9 | 57.5 | 152.9 | 255 s |
–nodes=2 | 2 | 32 | 2 | 16 | 14.4 | 39.4 | 136.5 | 205 s |
–nodes=2 | 2 | 32 | 4 | 8 | 18.0 | 32.0 | 94.0 | 162 s |
–nodes=2 | 2 | 32 | 8 | 4 | 27.0 | 30.6 | 76.6 | 152 s |
–nodes=4 | 4 | 64 | 4 | 16 | 14.9 | 21.6 | 68.6 | 122 s |
–nodes=4 | 4 | 64 | 8 | 8 | 17.8 | 19.5 | 47.3 | 102 s |
Table5. NSC Triolith with H5ToXDS
For all XDSAPP benchmark below we again use 1.3Å EIGER_16M_Nov2015.tar.bz2 data from dectris (900 frames) and an sbatch script of xdsapp instead of the GUI. In the xdsapp sbatch script
-j
corresponds to MAXIMUM_NUMBER_OF_JOBS in XDS.INP-c
corresponds to MAXIMUM_NUMBER_OF_PROCESSORS in XDS.INPj
X c
=number of nodes
X cores per node
= total number of cores
explained in detail under XDS multi-node-scripts.
Example script running xdsapp with 4 JOBS and 10 PROCESSORS per job when allocating 40 cores (2 nodes) at LUNARC Aurora
#!/bin/sh
#SBATCH -t 00:20:00
#SBATCH --nodes=2 --exclusive
#SBATCH -A snic2017-1-xxx
#SBATCH --mail-type=ALL
#SBATCH --mail-user=name.surname@lu.se
module load XDSAPP
xdsapp --cmd \
--dir /lunarc/nobackup/users/mochma/test_suite_NSC/bench_xdsapp/2-2-10 \
-j 4 \
-c 10 \
-i /lunarc/nobackup/users/mochma/test_suite_NSC/eiger/empty/2015_11_10/insu6_1_data_000001.h5
EIGER_16M_Nov2015.tar.bz2 | NSC Triolith | LUNARC Aurora |
---|---|---|
cores = JOBS X PROCESSORS | runtime | runtime |
16 = 1 X 16 | 18 min 17 sec | not tested |
20 = 1 X 20 | not tested | 9 min 12 sec |
16 = 2 X 8 | 12 min 50 sec | not tested |
20 = 2 X 10 | not tested | 7 min 36 sec |
16 = 4 X 4 | 11 min 22 sec | not tested |
20 = 4 X 5 | not tested | 6 min 11 sec |
20 = 5 X 4 | not tested | 6 min 23 sec |
32 = 2 X 16 | 12 min 43 sec | not tested |
40 = 2 X 20 | not tested | 5 min 11 sec |
32 = 4 X 8 | 9 min 2 sec | not tested |
40 = 4 X 10 | not tested | 5 min 15 sec |
32 = 8 X 4 | 7 min 17 sec | not tested |
40 = 8 X 5 | not tested | 4 min 35 sec |
Table6. 1.3Å eiger data from dectris (EIGER_16M_Nov2015.tar.bz2) processed by XDSAPP by a sbatch script. For this dataset XDSAPP make three XDS runs and Lunarc Aurora is faster than NSC Triolith with a factor of 2. The dectris-neggia library has been added to XDSAPP version 2.99 and therefore is therefore always used when processing eiger data with XDSAPP.
Guides, documentation and FAQ.
Applying for projects and login accounts.