Python installations at NSC

NSC's clusters have the CentOS standard Python 2 installed. That is the Python version you will get by default when you login to one of our clusters:

$ which python
/usr/bin/python

On Centos 7, it is Python 2.7.5 while on CentOS 6, it is Python 2.6.6. We also have Python 3 from the EPEL repositories, at the moment 3.6.8 on CentOS 7 and 3.4.10 on CentOS 6, as /usr/bin/python3.

We do not attempt to install a complete set of scientific computing packages for the CentOS/EPEL system Python installations.

To get access to a recent version of Python together with recent versions of the usual scientific libraries such as NumPy, SciPy, Matplotlib, Pandas etc, we recommend that you load one of our Python modules. For example, on Tetralith:

$ module avail Python
...
Python/2.7.14-anaconda-5.0.1-nsc1
Python/2.7.14-nsc1-gcc-2018a-eb
Python/2.7.14-nsc1-intel-2018a-eb
Python/2.7.15-anaconda-5.3.0-extras-nsc1
Python/2.7.15-env-nsc1-gcc-2018a-eb
Python/3.6.3-anaconda-5.0.1-nsc1
Python/3.6.4-nsc1-intel-2018a-eb
Python/3.6.4-nsc2-intel-2018a-eb
Python/3.6.7-env-nsc1-gcc-2018a-eb
Python/3.7.0-anaconda-5.3.0-extras-nsc1
...

To quickly check which packages and versions that are included with a gcc or intel Python module, one can type:

$ module show Python/3.6.7-env-nsc1-gcc-2018a-eb

After loading a suitable module, you will have a new Python installation in your PATH, where things like NumPy will work:

$ module load Python/3.6.7-env-nsc1-gcc-2018a-eb
$ python
Python 3.6.7 (default, Nov 26 2018, 16:42:15)
[GCC 6.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import numpy
>>> numpy.linspace(0, 2, 9)
array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

The modules that contain "-eb" in their name are built using the EasyBuild build system that is used for many other software packages on NSC CentOS 7 clusters. The ones that contain "anaconda" are based on the popular Anaconda Python distribution. At this time, NSC recommends a different way for using Anaconda Python (see further below, in "Installing Python and packages with Anaconda / Conda"), while these modules are kept new ones aren't added.

If there are modules that only differ in the -nsc build/installation tag, then choose the one with the highest integer (e.g. nsc2 rather than nsc1).

General recommendations

The NSC modules typically provide a set of Python packages that might be difficult for users to install themselves, such as optimized versions of NumPy and SciPy, but for technical reasons, we cannot install all the packages that everyone needs in the same installation. Instead, we recommend that you install them in your own project storage space. The NSC recommendations can be split into two main ways:

  • Start from the gcc or intel Python modules, create a virtual environment for building and adding packages using pip. This is for instance suitable when building a source code from scratch.

  • Build your own Python with packages using convenient conda binary installations, based on the popular Anaconda Python. Also, many software projects provide a recipe for an installation using conda.

Depending on the particular usage in mind, it is worth to investigate what might be the best way. It is a good practice to keep different projects and builds separate and not install into the default $HOME/.local when using pip.

Starting from the gcc or intel Python module builds

For using Python together with standard libraries and the most common packages, one may e.g. choose the "env" modules, for example on Tetralith:

Python/3.6.7-env-nsc1-gcc-2018a-eb
Python/2.7.15-env-nsc1-gcc-2018a-eb

To list the installed packages in an NSC build Python installation, you can quickly check with:

$ module show Python/3.6.7-env-nsc1-gcc-2018a-eb

or, just load the module and run pip list. If you are looking for a specific package, then pipe the output from pip list to grep:

$ module load Python/3.6.7-env-nsc1-gcc-2018a-eb
$ pip list --format=legacy | grep -i scipy
scipy (1.0.0)

More info: pip, pip list

For adding new / upgrading Python packages, it is recommended to use the pip command while working from within a virtual environment. This is to avoid installing packages in the default $HOME/.local, which affects all Python usage. A good practice is to keep different projects and builds as separate virtual environments and not install into .local.

For using Python with the need to compile packages from source code, it is recommended to choose between available gcc and intel builds, for example on Tetralith:

Python/3.6.7-env-nsc1-gcc-2018a-eb
Python/2.7.15-env-nsc1-gcc-2018a-eb
Python/3.6.4-nsc2-intel-2018a-eb
Python/2.7.14-nsc1-intel-2018a-eb

To then build a package, don't forget to load the corresponding build environment indicated in the name of the Python module, e.g.:

$ module load Python/3.6.7-env-nsc1-gcc-2018a-eb
$ module load buildenv-gcc/2018a-eb

Virtual environments

You can use the Python standard "virtualenv" mechanism to customize your Python and install additional packages. This works for all Python modules as well as the system Python. In this example, we base our virtual environment on Python/3.6.7-env-nsc1-gcc-2018a-eb and use "--system-site-packages" to have access to all the packages installed there instead of starting from scratch:

$ module load Python/3.6.7-env-nsc1-gcc-2018a-eb
$ module load buildenv-gcc/2018a-eb
$ virtualenv --system-site-packages myownvirtualenv
$ source myownvirtualenv/bin/activate
$ pip install python-hostlist
$ python
Python 3.6.7 (default, Nov 26 2018, 16:42:15)
[GCC 6.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hostlist
>>> hostlist.__file__
'/proj/ourprojname/users/x_abcde/myownvirtualenv/lib/python3.6/site-packages/hostlist.py'

to logout from the environment, type:

$ deactivate

When you login next time, you do not have to create the environment again, but can activate it using:

$ source myownvirtualenv/bin/activate

Installing Python and packages with Anaconda / Conda

There are many Python packages and other software which are available as precompiled binaries by using the Conda packet manager for the popular Anaconda Python distribution. This provides an ease of use for installation, helpful for example for testing the compatibility of different package versions.

The recommended way to use Anaconda for Python at NSC is to start from a module with the same name. Typically, select the latest version. For example, on Tetralith:

$ module avail Anaconda
...
Anaconda/2020.07-nsc1
Anaconda/2021.05-nsc1
...

Note that the default place for installations will be into $HOME/.conda. This can be problematic since conda environments often become very large. Therefore, one can create a link to one's project space, for example if the .conda folder already exists:

$ mv .conda /proj/ourprojname/users/x_abcde/
$ ln -s /proj/ourprojname/users/x_abcde/.conda .conda

After loading an Anaconda module, you can use "conda create" to create a customized Python environment with exactly the packages (and versions) you need. A basic example, assuming you want Python 3.8 together with the pandas and seaborn packages:

$ module load Anaconda/2021.05-nsc1
$ conda create -n myownenv python=3.8 pandas seaborn
$ conda activate myownenv
$ which python
~/.conda/envs/myownenv/bin/python
$ python
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> pandas.__version__
'1.4.2'

When you login next time, you do not have to create the environment again, but can activate it using:

$ module load Anaconda/2021.05-nsc1
$ conda activate myownenv

You can install additional packages using "conda install" inside your environment, for example:

$ conda install cython

If you cannot install a particular software using "conda install", it is still possible to use "pip install" in your environment. Though, note that installations using pip might give rise to version conflicts when mixed together with conda installations.

As you already have an environment you can write to, you should not use "--user" in that case. An example:

$ pip install python-hostlist
$ python
...
>>> import hostlist
>>> hostlist.__file__
'/home/x_abcde/.conda/envs/myownenv/lib/python3.8/site-packages/hostlist.py'

To list the already installed packages, simply run conda list. If you are looking for a specific package, then pipe the output from conda list to grep:

$ conda list | grep -i scipy
scipy                     1.8.0            py38h56a6a73_1    conda-forge

More info: conda, conda list

In practice, a way to work with conda can be to "start from scratch" with a clean install, when you know which package versions work well together, rather than adding/removing packages at a time.

When not to use conda environments

If you need to install a python package that requires compiling, then you shouldn't use a conda environment!

In this case, you can try to use a virtual environment based on one of the NSC build Python modules and if that fails, then contact support for help.

Conda environments outside of the home directory

You might want to create a conda environment outside of your home directory (the default when using "-n NAME" is to create it as $HOME/.conda/envs/NAME).

One reason for this could be to save space on the /home filesystem where the quota is more restrictive (as described above, a straightforward solution might be to create a link to your project space). Another reason could be that you want to curate a Python environment used by a group of fellow users, and want to make it less dependent on your home directory.

To accomplish this, use "-p PREFIX" instead of "-n NAME" when creating your environment and use the full prefix when activating. So, the start of the example above would become

$ module load Anaconda/2021.05-nsc1
$ conda create -p /proj/ourprojname/pythonenvs/test1 python=3 pandas seaborn
$ conda activate /proj/ourprojname/pythonenvs/test1

assuming you want your environment in /proj/ourprojname/pythonenvs/test1 (and also assuming for this example that /proj/ourprojname/pythonenvs have been created and you have write access).

The "old" Python Anaconda modules

As described above, the NSC recommendation is to start from an Anaconda module for binary installations using conda, rather than the "old" Python Anaconda modules which will not be updated anymore - for example on Tetralith:

$ module avail Python
...
Python/2.7.14-anaconda-5.0.1-nsc1
Python/2.7.15-anaconda-5.3.0-extras-nsc1
Python/3.6.3-anaconda-5.0.1-nsc1
Python/3.7.0-anaconda-5.3.0-extras-nsc1
...

While these modules contain many packages and can still be used, in most cases it is recommended to start from the Anaconda modules installing Python from scratch. The old modules can be used with conda and virtualenv as previously described, though note that activating a conda environment slightly differs, for example:

$ module load Python/3.7.0-anaconda-5.3.0-extras-nsc1
$ source activate myownenv

How do I control which version of Python my scripts use?

If a script demands the system Python using a first line like this

#!/usr/bin/python

you can change it to

#!/usr/bin/env python

to make the script pick up the Python from the currently loaded module (or your activated environment, if you have one).

If you want to "lock" a script to a specific Python version, figure out the full path to the desired python binary (use "which python") and use that instead. For example:

$ module load Python/3.6.7-env-nsc1-gcc-2018a-eb
$ which python
/software/sse/easybuild/prefix/software/Python/3.6.7-foss-2018a-nsc1/bin/python

Thus, to always use this Python for the script, make the first line read

#!/software/sse/easybuild/prefix/software/Python/3.6.7-foss-2018a-nsc1/bin/python

Why doesn't my Python program write to slurm.out?

To see the output from Python script in a running job in real-time, you have to instruct Python to not buffer its output. Otherwise, all the output from your script will get written to the slurm.out file when the job has finished (or the buffer is full). To get the expected behavior, simply add the -u command line flag when you start the script to requested unbuffered mode.

python -u myscript.py

For an executable script, you can add the flag to the first line:

#!/usr/bin/env python -u

How to use mpi4py

mpi4py is a common Python package for MPI parallel calculations. While it's included starting from the NSC Python modules, you might need to install it if you use a virtual environment or build Python using conda.

To use mpi4py might be as easy as the following setup in a job script, load the corresponding module (and / or environment if applicable):

#!/bin/bash
#SBATCH -A snic2022-x-yyy
#SBATCH -n 4
#SBATCH -t 01:00:00
#SBATCH -J jobname

module load mpprun/4.3.0
module load Python/3.6.7-env-nsc1-gcc-2018a-eb    
mpprun python ./pythonscript.py

Here one notes the use of the NSC launcher for MPI, mpprun. At the moment, a slightly newer version than the default is needed. An update will also soon be available on the Bi system.

To test that mpi4py is working correctly, one can run a small test script which prints out the mpi ranks:

$ cat mpi_test.py     
from mpi4py import MPI
print(f"Rank, size: {MPI.COMM_WORLD.rank}, {MPI.COMM_WORLD.size}")

Running on 4 cores, the output should look something like:

Rank, size: 0, 4
Rank, size: 2, 4
Rank, size: 3, 4
Rank, size: 1, 4

If the setup for mpi4py doesn't work, don't hesitate to contact support.


User Area

User support

Guides, documentation and FAQ.

Getting access

Applying for projects and login accounts.

System status

Everything OK!

No reported problems

Self-service

SUPR
NSC Express