We installed MX software in a HPC environment using Easybuild and called the installation PReSTO. Large HPC clusters, called supercomputers essentially consist of a login node and many compute nodes. At NSC Triolith, login and compute nodes have 16 cores each while at LUNARC Aurora the login node has 40 cores, and the compute nodes 20 cores. Typical HPC workflows involve using the login node to edit sbatch scripts and submit the sbatch script via a scheduling system to one or several compute nodes available. Both Triolith and Aurora use slurm for compute jobs scheduling where the queue is controled by a fairshare algoritm. The algoritm gives queue priority according to how much compute time has been spend for project A in comparison with jobs scheduled for project B etc.
When sharing compute time of a common project the workflow of the individual users have to be adapted not to waste compute time and getting the most out of each session. Questions will arise like:
The correct answer to these set of questions depend on the problem at hand, the software in question and will change due to software upgrades etc. however asking for just right amount of cores and compute time for the compute job at hand is the key to an efficient HPC workflow.
Simply join the academic PReSTO project and once a member the thinlinc application from cendio can be downloaded and installed on any Linux/Mac/Windows computer and enable convenient access to Triolith and Aurora. After thinlinc login to Triolith/Aurora you get a linux desktop environment giving you the impression that you just login to your own linux workstation.
Many MX graphics softwares are available for interactive work such as coot for macromolecular model building, pymol/ccp4mg/chimera for making publication pictures and presentation movies, or adxv/albula for inspecting diffraction images.
Graphics softwares are preferentially launched from the PReSTO menu using accelerated gl mode by default
In some cases you may want to launch coot/pymol from a certain directory and
to use the accelerated gl mode from a terminal window one should proceed as:
Open a terminal window, go to the directory of interest and
module load CCP4 coot - at Aurora vglrun coot - at Triolith
where vglrun in front of coot is required at Triolith for open-GL acceleration of graphics, happening automatically at Aurora
To launch pymol in accelerated gl mode from a terminal window
module load PyMOL pymol - at Aurora vglrun pymol - at Triolith
The interactive softwares should be run at the login node and no compute time is crediting the PReSTO project snic2018-3-251
Some MX software GUIs are intended to prepare for a parallel compute job by highlighting the relevant input parameters and share output in shape of images instead of ASCII logfiles. There are three softwares of this kind in PReSTO being:
XDSAPP, XDSGUI and hkl2map should always be run at a compute node and by launching the software using the PReSTO menu the software selected will be launched at a compute node using a custom made dialogue box. Another advantage of using the PReSTO menu for XDSAPP, XDSGUI and hkl2map is that once jobs are finished and GUIs terminated the remaining compute time estimated in the dialogue box is saved for later use.
To launch parallel MX software at a compute node (not using PReSTO menu and not recommended)
interactive --nodes=1 --exclusive -t 1:00:00 -A snic2017-1-xxx module load XDSAPP xdsapp
The “interactive” command above reserve a compute node for 1 hour and once finished you must close the XDSAPP window and perform “exit” in the compute node terminal window to save the remaining compute time.
The Phenix GUI can be run at the login node and submit parallel compute jobs to the HPC job queue via its inherent scheduler. For that reason the PReSTO menu launch Phenix GUI at the login node and whenever a parallel job is required the user can send it to the HPC queue. Examples of parallel phenix routines called wizards are:
phaser autobuild rosetta_refine mr_rosetta
The example here is ccp4i and ccp4i2, however we are working to enable slurm scheduling by the "Run on server" option in ccp4i2. Many MX softwares in ccp4 are not parallel such as
refmac5 arpWarp molrep chainsaw etc.
When wanting to run parallel ccp4 software for instance:
it is recommended to proceed as:
interactive --nodes=1 --exclusive -t 1:00:00 -A snic2017-1-199 module load CCP4 ccp4i2
and set the appropriate keywords in the GUI before pressing “Run”. We are developing a “Run on server” option for slurm scheduling in ccp4i2.
Many MX softwares intented for computing do not have a graphical user interface
autoPROC arcimboldo_lite buster crystFEL pipedream (part of buster) rhofit (part of buster) sharp XDS XDSME xia2 --dials xia2 --3dii
MX software without GUIs are preferentially run using an sbatch script for convenient submission to compute nodes. An sbatch script is the most efficient way of using a HPC system since once jobs are finished the allocation time is terminated.
Members of the protein crystallography community are invited to share your best "tips and tricks" and together we will be able to make the most of the PReSTO installation.
Oskar Aurelius from Stockholm University sharing his first multi-node CrystFEL script.