Python at NSC

Overview and quickstart

The NSC HPC clusters have a standard Python provided with the operating system (OS). When you are logged into a cluster with no modules loaded, invoking the command python will use this Python. For example, on Tetralith/Sigma with Rocky Linux 9, the version is:

$ python --version
Python 3.9.18

You can use the command which to see that it resides in a directory provided by the OS:

$ which python
/usr/bin/python

On most NSC systems, this will be some version of Python 3, but there are still some older systems where python refers to Python 2. Python 2 is sunsetted, which means there are no updates, security or other, since Jan 1, 2020. Hence, it is generally recommended when using Python 3 to specifically request this version via a versioned interpreter, i.e., python3:

$ python3 --version
Python 3.9.18

The python version provided by the OS will generally not be upgraded during the lifetime of the system unless the whole system undergoes a major OS upgrade. The Python from the OS works well to execute small generic scripts which are not so particular about which minor version of Python 3 is used and which do not have any difficult dependencies. Beyond small generic scripts, it is recommended to use a version of Python provided via the module system. A range of modules are provided relevant for different types of usage:

Python with a selection of common packages: the modules named Python/<version>-env-* provide a specific Python version along with a fairly extensive set of common Python packages for numerical and scientific computing. These modules are useful for Python scripts that need these extra packages but are not so particular with their precise versions. These modules can also be used with “venv” to create customized isolated environments that add further dependencies on top of the provided packages (more info below).
Bare Python modules: the modules named Python/<version>-bare-* provide a specific Python version and its standard library, but no other Python packages. These modules are primarily intended to be used with “venv” to create fully customized environments of selected Python packages that can be precisely chosen to match the needs of a Python program.

Both the Python/<version>-env-* and the Python/<version>-bare-* are named to show which compiler has been used to build Python, its standard library, and all included packages. These modules are the best choice to be used with your own Python packages that require compilation since you can then match the compiler version.
Anaconda/Miniforge modules: the modules named Anaconda/<version> and Miniforge/<version> provide the conda command for building completely customized “conda environments” where one can choose versions of the Python interpreter, Python packages, and other pre-built software. The Anaconda/<version> modules are presently deprecated due to a license change in March 2024 that can make it difficult for users to be sure if the software is used fully in accordance with the Anaconda terms of service. The Miniconda/<version> modules are very similar to the Anaconda ones but configured to use the conda-forge community-driven library of packages instead of channels offered by the Anaconda company.

When selecting which module to load, if there are modules for the same Python version that only differ in the -hpc or -nsc tag, choose the one with the highest number (e.g. pick hpc2 rather than hpc1).

The three use cases are explained in more detail in the subsections below.

Python with the most common modules for numerical and scientific computing: `Python/<version>-env-*`

The modules provided under the names Python/<version>-env-* provide a selection of Python versions along with NumPy, SciPy, Matplotlib, Pandas etc. For example, on Tetralith:

$ module avail python
...
Python/3.10.4-env-hpc1-gcc-2022a-eb
...

After loading a suitable module, you will have a new Python installation in your PATH, where things like NumPy will work:

$ module load Python/3.10.4-env-hpc2-gcc-2022a-eb
$ python
Python 3.10.4 (main, Oct  6 2023, 16:37:55) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.linspace(0, 2, 9)
array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

The included packages are shown in the module info, i.e.:

$ module show Python/3.10.4-env-hpc2-gcc-2022a-eb

Or, via the command python -m pip list. If you are looking for a specific package, you can try the following steps:

$ module load Python/3.10.4-env-hpc1-gcc-2022a-eb
$ python -m pip list | grep -i scipy
SciPy (1.8.1)

More info on listing packages with pip, pip list.

Furthermore, the pip command can also be used to add missing Python packages. However, we strongly recommend to not execute pip without also using Python virtual environments (venv). If venvs are not used, packages are installed inside your home directory $HOME/.local, which can affect all your Python usage and lead to confusing behaviour. Instead, you should set up separate venvs for different projects or software you wish to run. It is recommended only to use this feature to add missing Python packages, not to install newer versions of those already provided. For that, see the next section about using the <version>-bare-* modules instead.

For example, a quite straightforward way to get access to the atomic simulation environment (ase) is:

$ module load Python/3.10.4-env-hpc2-gcc-2022a-eb
$ python -m venv ase.venv
$ source ase.venv/bin/activate
$ python -m pip install ase
Collecting ase
  Using cached ase-3.23.0-py3-none-any.whl (2.9 MB)
...
$ python
Python 3.10.4 (main, Oct  6 2023, 16:37:55) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ase
>>> ase.__version__
'3.23.0'

Notes:

Instead of issuing pip, we invoke python -m pip. Some combinations of loaded packages may create environments where a pip shortcut is not created or points to the wrong thing. Using python -m pip ensures that packages are installed in the right directory for the right Python version.
When executing python -m pip, you may get a WARNING asking you to upgrade the version of pip itself. Such upgrades do not work well in the setup at NSC, so it is recommended that you ignore this warning.
The above steps show how to install Python packages with pip that do not require source code compilation. For more information on packages that require compilation via pip, see the separate section on compilation with pip below.

For more information on using venvs, see the separate section below.

Bare Python without preinstalled modules: `Python/<version>-bare-*`

The modules provided under the names Python/<version>-bare-* provide various Python versions without any preinstalled modules. For example, on Tetralith:

$ module avail python
...
Python/2.7.18-bare-hpc2-gcc-2022a-eb
Python/3.10.4-bare-hpc1-gcc-2022a-eb
...

When loading the module, you will generally not have extensive access to dependencies, e.g.:

$ module load Python/3.10.4-bare-hpc1-gcc-2022a-eb
$ python

Python 3.10.4 (main, Sep 20 2023, 07:20:18) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'numpy'

This module is an excellent basis for use with Python venvs for software installed using the command pip, which allows requesting precise versions of desired software. For example, to set up a virtual environment for Python 3.10.4 with numpy version 1.26.4 and scipy 1.11.4:

$ module load Python/3.10.4-bare-hpc1-gcc-2022a-eb
$ python -m venv numpyscipy.venv
$ source numpyscipy.venv/bin/activate
$ python -m pip install numpy==1.26.4 scipy==1.11.4
Collecting numpy==1.26.4
Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Collecting scipy==1.11.4
...
Installing collected packages: numpy, scipy
Successfully installed numpy-1.26.4 scipy-1.11.4
$ python

Python 3.10.4 (main, Sep 20 2023, 07:20:18) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.linspace(0, 2, 9)
array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])    

Notes:

When executing python -m pip, you may get a WARNING asking you to upgrade the version of pip itself. Such upgrades do not work well in the setup at NSC, so it is recommended that you ignore this warning.
The above steps show how to install Python packages with pip that do not require source code compilation. For more information on packages that require compilation via pip, see below.

For more information on using venvs, see the separate section below.

Modules for building conda environments: `Miniforge/<version>`

There are many Python packages and other software which are available as precompiled binaries by using the Conda packet manager that originates from the popular Anaconda Distribution. The currently recommended way to get access to this tool is via the modules named Miniforge/<version>. To see the available Miniforge modules on Tetralith:

$ module avail Miniforge
Miniforge/24.7.1-2-hpc1

Upon loading this module, a conda environment with a specific version of Python and various packages can easily be created, e.g.:

$module load Miniforge/24.7.1-2-hpc1
$conda create -n numpyscipy python=3.10 numpy==1.26.4 scipy=1.11.4
$conda activate numpyscipy
$python

Python 3.10.15 | packaged by conda-forge | (main, Oct 16 2024, 01:24:24) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.                      
>>> import numpy
>>> numpy.linspace(0, 2, 9)
array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])        

Notes:

Using the command mamba instead of conda may be beneficial. The former is a drop-in replacement for conda that usually is faster and better at resolving requested versions.
The default place for the files providing your conda environments is in your home directory inside $HOME/.conda. This directory can become very large. Hence, it is recommended to move this directory to your project space and provide access to it via a symlink:
```
 $ mkdir -p ~/.conda
 $ mv ~/.conda /proj/ourprojname/users/x_abcde/
 $ ln -s /proj/ourprojname/users/x_abcde/.conda ~/.conda
```

For more details on conda environments, see the separate section below.

More on Python virtual environments (venvs)

We first reiterate the example above for setting up a venv, but with more details on each step. First, load a bare Python module:

$ module load Python/3.10.4-bare-hpc1-gcc-2022a-eb

Now we set up a virtual environment in the subdirectory named numpyscipy.venv. Virtual environments can easily become very large, so this should preferably be placed in a project directory rather than your home directory.

$ python -m venv numpyscipy.venv

Now we “activate” this virtual environment. From this point forward, we use the versions of Packages and software installed in this virtual environment:

$ source numpyscipy.venv/bin/activate

Now we use “pip” to install some desired Python packages into this virtual environment:

$ python -m pip install numpy==1.26.4 scipy==1.11.4
Collecting numpy==1.26.4
Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Collecting scipy==1.11.4
...
Installing collected packages: numpy, scipy
Successfully installed numpy-1.26.4 scipy-1.11.4

At this point, we can invoke Python to use these packages:

$ python

Python 3.10.4 (main, Sep 20 2023, 07:20:18) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.linspace(0, 2, 9)
array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])    

If you want to leave the virtual environment, i.e., restore the environment to its state before activating the venv, do:

$ deactivate

When you log in next time, you do not have to recreate the environment. Just activate it again using:

$ source numpyscipy.venv/bin/activate

Pip packages that require compilation

Invoking pip sometimes leads to an attempt to compile software, e.g., for Python packages providing an interface to a library written in, say, C. In many cases, this may work without issues (often via the standard C compiler provided by the OS).

However, it may be desirable to take more detailed control of this compilation if, e.g., (i) the compilation breaks due to lack of support of certain languages or compiler features; (ii) compiled software that needs to interact with MPI; (iii) compiled software that needs to interact with compiled libraries in other packages and where using different compilers may introduce incompatibilities.

A good first step is to load the desired buildenv to use, where it is highly recommended to use the same build environment as indicated by the name of the loaded Python module, e.g.:

$ module load Python/3.10.4-bare-hpc1-gcc-2022a-eb
$ module load buildenv-gcc/2022a-eb

Then create and activate a venv:

$ python3 -m venv venv
$ source venv/bin/activate

At this point, it is recommended to set environment variables to get pip to the MPI-based compilers provided by the build environment:

$ export CC=mpicc CXX=mpic++ FC=mpifort MPICH_CC="" MPICH_CXX="" MPICH_FC=""

Subsequently, when python -m pip is used to install software with compiled components, these compilers will be used, e.g.:

$ python3 -m pip install ase

Building software manually to integrate with pip packages in a venv

You may need to build Python packages (or even non-Python software) that must interface with the packages installed in your venv. To illustrate the steps needed in a typical case, we will go through the steps to get the pip package “kimpy” from the OpenKIM project to work. This pip package has a dependency, “kim-api”, which is not installable with pip but rather needs to be built out of a GitHub repository.

Start from the above example, where we already have an activated venv in the directory venv. Create a build and installation directory:

$ mkdir builds kim-api
$ mkdir kim-api/lib64
$ ln -s lib64 kim-api/lib

Set up the environment variables needed to build this software in a way where it will find libraries both as part of kim-api and other libraries installed in the venv, e.g.:

$ export CPPFLAGS="-Wl,-rpath=$(pwd -P)/venv/lib:$(pwd -P)/kim-api/lib"
$ export CFLAGS="-Wl,-rpath=$(pwd -P)/venv/lib:$(pwd -P)/kim-api/lib"
$ export LIBRARY_PATH="$(pwd -P)/venv/lib:$(pwd -P)/kim-api/lib"
$ export CPLUS_INCLUDE_PATH="$(pwd -P)/kim-api/include"
$ export PKG_CONFIG_PATH="$(pwd -P)/kim-api/lib/pkgconfig"

Clone the repository, check out the desired version, and build according to instructions supplied by kim-api:

$ cd builds
$ git clone https://github.com/openkim/kim-api.git
$ cd kim-api
$ git checkout v2.3.0
$ mkdir build
$ cd build
$ cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX:PATH="$(pwd -P)/../../../kim-api"
$ make
$ make install

At this point, it is possible to install the pip package “kimpy”, which has “kim-api” as a dependency:

$ python -m pip install kimpy

Which creates a loadable kimpy module:

$ python
Python 3.10.4 (main, Sep 20 2023, 07:20:18) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import kimpy
>>> kimpy.__version__
'2.1.1'

More on conda environments

Here follows an example of using the conda command to create an environment with the pandas and seaborn packages:

$ module load Miniforge/24.7.1-2-hpc1
$ conda create -n myownenv python=3.8 pandas seaborn
$ conda activate myownenv
$ which python
~/.conda/envs/myownenv/bin/python
$ python
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> pandas.__version__
'1.4.2'

When you log in next time, you do not have to create the environment again, but you can activate it using:

$ module load Miniforge/24.7.1-2-hpc1
$ conda activate myownenv

You can install additional packages using “conda install” inside your environment, for example:

$ conda install cython

If you cannot install a particular software using conda install, it is still possible to use pip install in your environment. However, note that installations using pip might give rise to version conflicts when mixed with conda installations.

$ pip install python-hostlist
$ python
...
>>> import hostlist
>>> hostlist.__file__
'/home/x_abcde/.conda/envs/myownenv/lib/python3.8/site-packages/hostlist.py'

Note:

You may find instructions elsewhere directing you to use a --user flag with conda. When working with conda environments, as instructed here, you should not use that flag.

To list the already installed packages, just run conda list. If you are looking for a specific package, then pipe the output from conda list to grep:

$ conda list | grep -i scipy
scipy                     1.8.0            py38h56a6a73_1    conda-forge

More info: conda, conda list

Generally, one should not have too high expectations of conda being able to create well-working environments based on a series of ‘add’ and ‘remove’ commands. It is often better to destroy the environment and restart it from scratch with the precise set of desired versions.

Specify a location for a conda environment

You might want to create a conda environment outside of your home directory (the default when using “-n NAME” is to create it as $HOME/.conda/envs/NAME).

One reason for this could be to save space on the /home filesystem where the quota is more restrictive (as described above, a straightforward solution might be to create a link to your project space). Another reason could be that you want to curate a Python environment used by a group of fellow users and want to make it less dependent on your home directory.

To accomplish this, use “-p PREFIX” instead of “-n NAME” when creating your environment and use the full prefix when activating. So, the start of the example above would become

$ module load Miniforge/24.7.1-2-hpc1
$ conda create -p /proj/ourprojname/pythonenvs/test1 python=3 pandas seaborn
$ conda activate /proj/ourprojname/pythonenvs/test1

assuming you want your environment in /proj/ourprojname/pythonenvs/test1 (and also assuming for this example that /proj/ourprojname/pythonenvs have been created and you have write access).

More information about conda, including compiling software into / alongside conda environments

For more info on using conda both for Python and other software, see the NSC Anaconda page.

How to specify Python versions in scripts

A straightforward way to run a Python program is to specify the specific interpreter manually, e.g.,

$ /usr/bin/python3 my_python_script.py

However, Python script files can be made executable, and will then invoke a Python interpreter based on the first line of the script. For example, if the first line specifically points to the system Python:

#!/usr/bin/python

you can change it to

#!/usr/bin/env python

to make the script pick up the Python from the environment in which you execute the script (i.e., the currently loaded module or activated environment).

As described in the following, one may consider “locking” a script to a specific Python version. However, this may cause issues with other packages that Python versions provide. Figure out the full path to the desired Python binary (use “which python”) and use that instead. For example:

$ module load Python/3.10.4-env-hpc1-gcc-2022a-eb
$ which python
/software/sse2/tetralith_el9/easybuild/pure/software/Python/3.10.4-GCCcore-11.3.0/bin/python

Thus, to always use this Python for the script, make the first line read:

#!/software/sse2/tetralith_el9/easybuild/pure/software/Python/3.10.4-GCCcore-11.3.0/bin/python

Why doesn’t my Python program write to slurm.out?

To see the output from the Python script in a running job in real-time, you have to instruct Python not to buffer its output. Otherwise, all the output from your script will get written to the slurm.out file when the job has finished (or the buffer is full). To get the expected behaviour, add the -u command line flag when you start the script to the requested unbuffered mode.

python -u myscript.py

For an executable script, you need to specify the path to the Python you want to use and add the flag to the first line, e.g. for the system Python:

#!/usr/bin/python -u

How to use mpi4py

mpi4py is a common Python package for MPI parallel calculations. While it’s included starting from the NSC Python modules, you might need to install it if you use a virtual environment or build Python using conda.

To use mpi4py might be as easy as the following setup in a job script, load the corresponding module (and / or environment if applicable):

#!/bin/bash
#SBATCH -A naiss2024-x-yyy
#SBATCH -n 4
#SBATCH -t 01:00:00
#SBATCH -J jobname

module load Python/3.10.4-env-hpc1-gcc-2022a-eb    
mpprun python ./pythonscript.py

Note here the use of the NSC launcher for MPI, mpprun.

To test that mpi4py is working correctly, one can run a small test script which prints out the mpi ranks:

$ cat mpi_test.py     
from mpi4py import MPI
print(f"Rank, size: {MPI.COMM_WORLD.rank}, {MPI.COMM_WORLD.size}")

Running on 4 cores, the output should look something like:

Rank, size: 0, 4
Rank, size: 2, 4
Rank, size: 3, 4
Rank, size: 1, 4

If the setup for mpi4py doesn’t work, don’t hesitate to contact support.

Python at NSC

Overview and quickstart

Python with the most common modules for numerical and scientific computing: `Python/<version>-env-*`

Bare Python without preinstalled modules: `Python/<version>-bare-*`

Modules for building conda environments: `Miniforge/<version>`

More on Python virtual environments (venvs)

Pip packages that require compilation

Building software manually to integrate with pip packages in a venv

More on conda environments

Specify a location for a conda environment

More information about conda, including compiling software into / alongside conda environments

How to specify Python versions in scripts

Why doesn’t my Python program write to slurm.out?

How to use mpi4py

User support

Getting access

Everything OK!

Self-service

Python at NSC

Overview and quickstart

Python with the most common modules for numerical and scientific computing: Python/<version>-env-*

Bare Python without preinstalled modules: Python/<version>-bare-*

Modules for building conda environments: Miniforge/<version>

More on Python virtual environments (venvs)

Pip packages that require compilation

Building software manually to integrate with pip packages in a venv

More on conda environments

Specify a location for a conda environment

More information about conda, including compiling software into / alongside conda environments

How to specify Python versions in scripts

Why doesn’t my Python program write to slurm.out?

How to use mpi4py

User support

Getting access

Everything OK!

Self-service

Python with the most common modules for numerical and scientific computing: `Python/<version>-env-*`

Bare Python without preinstalled modules: `Python/<version>-bare-*`

Modules for building conda environments: `Miniforge/<version>`