Anomalous data processing using cbf files at NSC Tetralith

Many beamlines with Pilatus detectors output cbf-files instead of HDF5 containers. Below is the BioMAX script adapted for Tetralith with 32 cores per node and using cbf files instead of HDF5 containers. To run cbf.script copy-paste it and make it runnable by chmod 755 cbf.script, create the output directory and execute ./cbf.script /path/file_00001.cbf /path/output-directory 1 3600 and please note the output directory must exist or otherwise the scripts does not run.

cbf.script

#!/bin/sh -eu
#
# Argument list:
# $1: indata 
# $2: utdata
# $3: fromrange 
# $4: torange  

indir=`realpath -es "$1"`
outdir=`realpath -es "$2"`
dirToFile="$(dirname "$indir")"
cbf_file="$(basename "$indir")"
cbf_bars="$(echo $cbf_file | sed 's/00001.cbf/#####.cbf/g' | sed 's/0001.cbf/####.cbf/g' | sed 's/001.cbf/###.cbf/g')"

xdsapp="\
module load XDSAPP/2.99.1-9-PReSTO
xdsapp --cmd --dir=$outdir/xdsapp -j 8 -c 8 -i $indir  --fried=false --range=$3\ $4 --spotrange=$3\ $4
"

autoproc="\
module load autoPROC/20200206-1-PReSTO
process \
    -Id id1,$dirToFile,$cbf_bars,$3,$4 \
    -ANO \
    autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_JOBS=4 \
    autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_PROCESSORS=8 \
    autoPROC_XdsKeyword_DATA_RANGE=$3\ $4 \
    autoPROC_XdsKeyword_SPOT_RANGE=$3\ $4 \
    -d $outdir/autoproc 
"

dials="\
module load DIALS/2.1.3-2-PReSTO
cd $outdir/dials
xia2 atom=X \
    pipeline=dials failover=true \
    image=$indir:$3:$4 \
    multiprocessing.mode=serial \
    multiprocessing.njob=1 \
    multiprocessing.nproc=auto
"

# autoproc bails out if its outdir basename exists; don't make it
mkdir "$outdir/xdsapp" "$outdir/dials"

#echo "$xdsapp"
#echo "$autoproc" 
#echo "$dials"

sbatch -N2 --exclusive --ntasks-per-node=32 -A snic2020-5-368 -t 00:10:00 -J XDSAPP   -o "$outdir/xdsapp.out"   --wrap="$xdsapp"
sbatch -N1 --exclusive --ntasks-per-node=32 -A snic2020-5-368 -t 00:30:00 -J autoPROC -o "$outdir/autoproc.out" --wrap="$autoproc"
sbatch -N1 --exclusive --ntasks-per-node=32 -A snic2020-5-368 -t 00:30:00 -J DIALS    -o "$outdir/dials.out"    --wrap="$dials"

XDSME with Eiger dataset from BioMAX

#!/bin/sh
#SBATCH -t 00:30:00
#SBATCH -N 1 --exclusive
#SBATCH -A snic2020-5-368
#SBATCH --mail-type=ALL
#SBATCH --mail-user=name.surname@lu.se
module load XDSME/0.6.6-7-PReSTO
xdsme \
-i "ROTATION_AXIS= 0.0 -1.0 0.0" \
/proj/xray/users/x_user/tau/tau1-tau_1_??????.h5

CrystFEL multinode example at Tetralith

CrystFEL is software intended for indexing and integrating diffraction patterns thar originate from free electron laser serial femtosecond crystallography (SFX). CrystFEL is also for simulating patterns, calculating figures of merit for the data and visualising the results.

Oskar Aurelius from Stockholm University is the first user of CrystFEL at Triolith and shared a script for multi-node execution of this compute intensive software package.

#!/bin/sh
# Split a large indexing job into many small tasks and submit using SLURM
# Copyright © 2016-2017 Deutsches Elektronen-Synchrotron DESY,
#                       a research centre of the Helmholtz Association.
# Authors:
#   2016      Steve Aplin <steve.aplin@desy.de>
#   2016-2017 Thomas White <taw@physics.org>
#   2017 Modified by Oskar Aurelius <oskar.aurelius@dbb.su.se>


LAUNCH=TRUE # Launch jobs directly if 'TRUE'. Otherwise just write each script file

MULTI_EVENT_FILES=FALSE #If using multi-event ("CXI") files, should be 'TRUE'
INPUT=files.lst # List of frames to be processed
RUN=trial_run1 # Name of run
GEOM=5HT2B-Liu-2013.geom # Geometry file
STREAMDIR=output # Directory for output. No trailing /
PARAM="--peaks=hdf5 --int-radius=3,4,5 --indexing=mosflm-axes-latt -p 5ht2b.cell" # Parameters for indexamajig

NPROC=16 # Number of CPU cores to use per job
MAX_TIME=01:00:00 # Maximum usage time of one node. hh:mm:ss
SPLIT=1000  # Number of frames per job/node

CRY_V=CrystFEL/0.6.3-PReSTO # Version (module name) of CrystFEL to use
CCP4_V=CCP4/7.0.045-SHELX-ARP-7.6-PReSTO # Version (module name) of CCP4. Needed for mosflm
XDS_V=XDS/20170923-PReSTO # Version (module name) of XDS
L_BIN=/home/x_user/local_bin # Path to extra executables. mosflm

PROPOSAL=snic2017-1-xxx # SNIC proposal for compute time usage
MAIL=name.surname@lu.se  # Email address for SLURM notifications

################################################################################################################

if [[ $MULTI_EVENT_FILES == TRUE ]]; then
   module load $CRY_V
   list_events -i $INPUT -g $GEOM -o events-${RUN}.lst
   if [[ $? != 0 ]]; then
      echo "list_events failed"
      exit 1
   fi
elif [[ $MULTI_EVENT_FILES == FALSE ]]; then
   cp $INPUT events-${RUN}.lst
else
   echo "Have to pick if MULTI_EVEN_FILES is TRUE or FALSE"
   exit 1
fi

# Count total number of events
wc -l events-${RUN}.lst

# Split the events up, will create files with $SPLIT lines
split -a 3 -d -l $SPLIT events-${RUN}.lst split-events-${RUN}.lst

# Clean up
rm -f events-${RUN}.lst

# Loop over the event list files, and submit a batch job for each of them
for FILE in split-events-${RUN}.lst*; do

    # Stream file is the output of crystfel
    STREAM=`echo $FILE | sed -e "s/split-events-${RUN}.lst/${RUN}.stream/"`

    # Job name
    NAME=`echo $FILE | sed -e "s/split-events-${RUN}.lst/${RUN}-/"`
 
    echo "$NAME: $FILE  --->  $STREAM"

    SLURMFILE="${NAME}.sh"

    echo "#!/bin/sh" > $SLURMFILE
    echo >> $SLURMFILE

    echo "#SBATCH --account   $PROPOSAL" >>$SLURMFILE
    echo >> $SLURMFILE

    echo "#SBATCH --time=$MAX_TIME" >> $SLURMFILE
    echo "#SBATCH --nodes=1 --exclusive" >> $SLURMFILE
    echo >> $SLURMFILE

    echo "#SBATCH --workdir   $PWD" >> $SLURMFILE
    echo "#SBATCH --job-name  $NAME" >> $SLURMFILE
    echo "#SBATCH --output    $NAME.out" >> $SLURMFILE
    echo >> $SLURMFILE

    echo "#SBATCH --mail-type ALL" >> $SLURMFILE
    echo "#SBATCH --mail-user $MAIL" >> $SLURMFILE
    echo >> $SLURMFILE

    echo "module load $CRY_V" >> $SLURMFILE  # Load CrystFEL
    echo "module load $CCP4_V" >> $SLURMFILE  # Load CCP4
    echo "module load $XDS_V" >> $SLURMFILE  # Load XDS
    echo "PATH=\$PATH:$L_BIN" >> $SLURMFILE  # Add path with extra executables
    echo >> $SLURMFILE

    command="indexamajig -i $FILE -o $STREAMDIR/$STREAM"
    command="$command -j $NPROC -g $GEOM"
    command="$command $PARAM"  # Indexing and other parameters added here

    echo $command >> $SLURMFILE

    if [ $LAUNCH == TRUE ]; then
      sbatch $SLURMFILE
    fi

done

User Area

User support

Guides, documentation and FAQ.

Getting access

Applying for projects and login accounts.

System status

Everything OK!

No reported problems

Self-service

SUPR
NSC Express