Using Conda/Mamba on Berzelius

1. Introduction

Conda is an open-source package management and environment management system primarily used for Python. It allows you to create isolated environments with specific sets of packages and dependencies to avoid conflicts between different projects.

The concepts of Conda, Miniconda, Anaconda, and Mamba are all related to package management and environment management in Python. Let’s explore each concept:

  • Conda is the core package and environment management tool.
  • Miniconda is a minimal Conda installer, allowing you to create custom environments.
  • Anaconda is a comprehensive distribution of Conda, focused on data science and scientific computing.
  • Mamba is an alternative package manager that can be used as a replacement for the default Conda package manager, providing improved performance.
  • Mambaforge is a distribution of Mamba that provides a pre-packaged and ready-to-use environment for Conda users.
  • Miniforge is a minimal installer for Conda (like Miniconda) that uses community-driven channels such as conda-forge by default.

The default channel for Conda is anaconda, and Mamba installs packages by default from the channel conda-forge.

We recommend to use Mamba for its improved performance.

2. Mamba on Berzelius

When using Conda, the “solving environment” stage is where it determines the best way to install your requested packages, considering all existing dependencies in the environment. This step can be very slow, especially with complex dependency chains or outdated metadata.

To address this, we strongly recommend using Mamba, a fast, drop-in replacement for Conda written in C++. Mamba offers:

  • Significantly faster dependency resolution
  • Parallel downloading of package files and metadata

2.1 Speed Comparison

The following table compares installation time for:

# Using mamba
mamba install pytorch==2.3.0 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
# Using conda
conda install pytorch==2.3.0 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Mamba demonstrates a significantly faster installation time, completing in 1m17.021s compared to Conda’s 6m26.264s.

Framework Time
Conda 6m26.264s
Mamba 1m17.021s

2.2 Searching for a Package

To search for a specific version of a package (e.g. pytorch 2.5.1 built for CUDA 11.x), use:

mamba search pytorch==2.5.1 | grep cuda11

Each result follows the format:

pytorch                        2.5.1 cuda118_py310h8b36b8a_300  conda-forge         

Where:

  • cuda118_py310 indicates CUDA 11.8 with Python 3.10.
  • The trailing hash (h8b36b8a_300, etc.) is the build string, which differentiates between various compiled versions.
  • conda-forge is the source channel.

2.3 Creating an Environment

Berzelius provides Mamba via the Miniforge3 module. To create and use a custom environment for PyTorch 2.5.1 with Python 3.10 and CUDA 11.8 support:

module load Miniforge3/24.7.1-2-hpc1-bdist
mamba create --name pytorch-2.5.1-python-3.10 python=3.10
mamba activate pytorch-2.5.1-python-3.10
CONDA_OVERRIDE_CUDA=11.8 mamba install pytorch==2.5.1=cuda* torchvision=*=cuda* torchaudio=*=cuda*

Understanding the Syntax: pytorch==2.5.1=cuda*

  • pytorch==2.5.1 specifies the exact version of PyTorch you want.

  • =cuda* is a build string matcher that tells Conda/Mamba to select a build that includes CUDA support, typically built with cudatoolkit, such as cuda118_py310_* or cuda121_py310_*. Without this, you might accidentally install a CPU-only build (e.g., cpu_generic or cpu_mkl), which will not use the GPU.

To remove the environment when done:

mamba env remove -n pytorch-2.5.1-python-3.10

2.4 Installing from a Specific Channel

While Mamba uses conda-forge by default, you can install packages from other channels by specifying them explicitly with -c. Order matters: channels listed earlier take precedence during dependency resolution.

To install a bioinformatics tool from bioconda:

mamba install samtools -c bioconda

3. Conda on Berzelius

The Anaconda Python distribution license terms were updated on March 31, 2024 in a way that we interpret as restricting also general noncommercial use of Anaconda "Free edition" in organizations with more than 200 employees. To help users avoid possible licensing issues, we have deprecated the Anaconda modules and now direct users to the compatible community-provided software in our Mambaforge modules.

The Anaconda installation is available on Berzelius via the module system, for example, Anaconda/2023.09-0-hpc1-bdist.

A basic example of creating a Conda environment called myenv with Python 3.8, including the pandas and seaborn packages:

module load Anaconda/2023.09-0-hpc1-bdist
conda create -n myenv python=3.8 pandas seaborn
conda activate myenv

To find valid conda package names look at Anaconda repo package search.

To remove the environment when done:

conda env remove -n myenv

4. Conda/Mamba Best Practices

Issues can arise when conda and pip are used together to create an environment.

  • Running conda after pip has the potential to overwrite and break packages installed via pip.
  • Running pip after conda may upgrade or remove a conda-installed package.

A general best practice guideline is

  • Create a conda environment for isolation.
  • Try to only use conda packages.
  • Install as many requirements as possible with conda, then use pip.

For more details, please read Using Pip in a Conda Environment

5. Common Issues

5.1 Relocating Conda/Mamba Environments

The default location for Conda/Mamba environments is ~/.conda in your home directory. This location can be problematic as these environments can become very large. It is recommended to redirect this directory using a symbolic link to your project directory.

mv ~/.conda /proj/<your project>/users/<your username>
ln -s /proj/<your project>/users/<your username>/.conda ~/.conda

5.2 Avoid Modifying ~/.bashrc or ~/.bash_profile

Avoid modifying your ~/.bashrc or ~/.bash_profile with conda init or similar commands. These modifications can interfere with the module system on Berzelius and break environment activation.

6. Cheat Sheet

Operation Command
create a new environment conda/mamba create -n ENVNAME
create environment with Python version conda/mamba create -n ENVNAME python=3.10
activate environment conda/mamba activate ENVNAME
install a package from specific channel conda/mamba install -c CHANNELNAME PKGNAME
install specific version of package conda/mamba install PKGNAME==3.1.4
uninstall package conda/mamba uninstall PKGNAME
list installed packages conda/mamba list
delete environment by name conda/mamba env remove -n ENVNAME
export environment conda/mamba env export -n ENVNAME>ENV.yml
import environment conda/mamba env create -n ENVNAME --file ENV.yml

User Area

User support

Guides, documentation and FAQ.

Getting access

Applying for projects and login accounts.

System status

Everything OK!

No reported problems

Self-service

SUPR
NSC Express