Using Enroot on Berzelius

Enroot is a simple, yet powerful tool to turn container images into unprivileged sandboxes. Enroot is targeted for HPC environments with integration with the Slurm scheduler, but can also be used as a standalone tool to run containers as an unprivileged user. Enroot is similar to Singularity, but with the added benefit of allowing users to read/write in the container and also to appear as a root user within the container environment.

Please read Enroot's github page for more information.

Set up Nvidia credentials

This step is necessary for importing container images from Nvidia NGC.

  • Complete step 4.1 and 4.3. Save the API key.

  • Add the API key by adding these lines to the config file at ~/.config/enroot/.credentials

    machine login $oauthtoken password your_api_key
    machine login $oauthtoken password your_api_key

    Please replace your_api_key with your real API key.

  • Set the config path by adding the line to ~/.bashrc

    export ENROOT_CONFIG_PATH=/home/xuagu37/.config/enroot
  • To make the path valid

    source ~/.bashrc

Set path to user container storage

By default, your enroot containers will be saved in your home directory. On Berzelius, you have only 20 GB disk space for home. Please put enroot containers in your project directory.

We create the directories for Enroot container storage.

mkdir -p /proj/nsc_testing/xuan/enroot/cache /proj/nsc_testing/xuan/enroot/data

Add following lines to your ~/.bashrc

export ENROOT_CACHE_PATH=/proj/nsc_testing/xuan/enroot/cache
export ENROOT_DATA_PATH=/proj/nsc_testing/xuan/enroot/data

To make the change valid

source ~/.bashrc

Import container images

You can import a container image either from Nvidia NGC or Pytorch/Tensorflow official Docker Hub repositories. When finishing importing, you will find a sqsh file in your present working directory.

  • From Nvidia NGC

    enroot import 'docker://'
    enroot import 'docker://'

    For other versions, please see the release notes for Pytorch and Tensorflow.

  • From Pytorch/Tensorflow official Docker Hub repositories

    enroot import 'docker://pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel'
    enroot import 'docker://tensorflow/tensorflow:2.11.0-gpu'

    For other versions, please see the Docker tags for Pytorch and Tensorflow.

Create a container

enroot create --name nvidia_pytorch_22.09 nvidia+pytorch+22.09-py3.sqsh

Start a container

You need to be on a compute node where you have access to GPU resources to start a container.

  • As the (fake) root user

    enroot start --root --rw --mount /path/on/host:/path/in/container nvidia_pytorch_22.09  

    This gives you full power to install new software in the container. The flag --mount mounts your local directory to your container.

  • As a non-root user

    enroot start --rw --mount /path/on/host:/path/in/container nvidia_pytorch_22.09  
  • You can also start a container and execute your command at the same time.

    enroot start --rw --mount /path/on/host:/path/in/container nvidia_pytorch_22.09 sh -c 'python /path/to/' 

Submit an Enroot job using a sbatch script

Submitting an Enroot job is similar to running other commands/programs using a sbatch script. See an example below.

#SBATCH -A your_project
#SBATCH --nodes=1
#SBATCH --gpus=8
#SBATCH --time=0-0:10:00

enroot start --rw --mount /your/home:/your/home --mount /your/proj:/your/proj your_container_name bash -c "cd /your/working/dir && run some_command/script"

With the slurm plugin Pyxis, we can run an Enroot job on multiple nodes. See an example below.

#SBATCH -A your_project
#SBATCH --nodes=2
#SBATCH --gres=gpu:8
#SBATCH --ntasks-per-node=8
#SBATCH --time=0-00:10:00

srun --container-image=/path/to/your_container.sqsh --container-name=your_container --container-mounts=/your/home:/your/home,/your/proj:/your/proj --container-writable bash -c "cd /your/working/dir && run some_command/script"

Cheat sheet

Task Command
Import a new container enroot import 'docker://'
Create an instance of a container enroot create --name pytorch nvidia+pytorch+22.09-py3.sqsh
Destroy an instance enroot remove pytorch
Run an enroot image enroot start --rw --mount /path/on/host:/path/in/container pytorch
Run an enroot image as root enroot start --root --rw --mount /path/on/host:/path/in/container pytorch
Run an enroot image and execute your command enroot start --rw --mount /path/on/host:/path/in/container pytorch sh -c 'python'

User Area

User support

Guides, documentation and FAQ.

Getting access

Applying for projects and login accounts.

System status

Everything OK!

No reported problems


NSC Express