Stratus and Cirrus operating system upgrade 2024

This is not the final version of this page. If you cannot find the information you are looking for, please check back later or ask NSC Support.

What will happen?

During March-April 2024 testing of the new OS can be done by connecting to the stratus2 and cirrus2 login nodes. Those login nodes run the new OS and jobs started from them will run in a small reservation of the cluster that has upgraded compute nodes.
During May staggered upgrades of the entire system will be done. Final dates for each system will be a discussion between the representatives and NSC, but software you want to run should be tested and ready by April 30.
During the first week of May Cirrus will have a few hours downtime where all compute nodes are switched and where the “cirrus” login node alias is redirected to point to a login node with the new OS by default.
During the second week of May Stratus will also change over in the same manner as Cirrus.
Some time in the August - September timeframe there will be a whole day of downtime for each system to finalize the switchover. Until then we will retain the capacity to wholy or partially switch back to the old OS fairly quickly in an emergency situation.

What has happened?

Some groups have been testing the upgrade for some time now on a limited number of nodes.

On the production followup between Metcoop representatives 2024-02-29 we decided on the final plan which calls for a complete switch of OS for both Cirrus and Stratus before May 1 2024.

What do I need to do?

To continue using Stratus and Cirrus without interruption, you should make sure your jobs run on the upgraded parts of Stratus and Cirrus immediately and sort out any problems before April 14.

You yourself choose when to do this, but we recommend that you do it sooner rather than later to leave as much time as possible for you (and possibly NSC) to fix any problems you may encounter.

Typically, it will look something like this:

Make some basic tests to see what you need to change and verify that at least one job seems to start correctly in the upgraded part of Stratus and Cirrus (i.e when submitted from stratus2.nsc.liu.se (hostname “stratus2”)).
Submit any new jobs from stratus2.nsc.liu.se (hostname “tetralith2”) with possibly modified settings/job scripts/modules/…
Monitor the first few new jobs and once they have completed, check that the results are as expected.

Note that the amount of compute nodes available to you with the new OS currently is very limited to prevent impacting running forecasts. If you can’t test your software stack with the amount of nodes available please let Rafael Grote and Lars Berggren know so they can coordinate tests of that without disrupting production.

Storage (/nobackup, /home) are not affected (i.e no need to move your data).

In short: Test your software today, and don’t hesitate to contact NSC Support if you need help.

Software migration guide

This section covers how to get software you have been running on CentOS 7 to run on Rocky Linux 9.

Applications provided by NSC

Unlike most clusters managed by NSC, there is not a lot of software provided directly by NSC since users prefer and are expected to build their own software stacks.

There are the basic set of compilers and some niceties like Anaconda and Mambaforge.

Additional applications and versions can be added, but only if requested via NSC Support.

If you don’t find your application/version in the output from module avail (on upgraded nodes) and it’s not mentioned on this page already, please contact NSC Support and ask about it.

Compiled applications that you or your project have installed yourself on CentOS 7 to run on compute nodes

If you or someone in your project has built or installed your own application you will need to choose a suitable way to run the application in the new environment.

There are several ways to do this (recompile, run as-is). Which one to use depends on the application. NSC Support can assist you with this. Some documentation is provided below, more will follow later.

Inside a supercomputer job, please try the following steps (i.e., inside a submit script, or after issuing interactive). Note that it is a good idea to verify not just that the application starts, but also to try an example job to check that it performs as expected.

Run your application as usual, i.e., as mpprun <application>. Some applications just work without any additional steps. However, you may see error messages about “missing symbols” or “missing libraries”. In that case, go to the next step.
The next step is to try to rebuild the application with one of the build environments provided in the Rocky Linux 9 environment. Load an appropriate buildenv-<something> module and follow the usual instructions for how to install software. Note: make sure to completely rebuild the application, i.e., do a make dist-clean, make clean, or equivalent as the first step to make sure all components are rebuilt from scratch.

If you or your project have installed a software application that you like to use on the login node for data analysis etc., please try the following steps:

Try to run your application as usual, e.g., as ./<application>. Some applications just work without any additional steps. However, you may see error messages about “missing symbols” or “missing libraries”. In that case, go to the next step.
The next step is to try to rebuild the application with one of the build environments provided in the Rocky Linux 9 environment. Load an appropriate buildenv-<something> module and follow the usual instructions for how to install software. Note: make sure to completely rebuild the application, i.e., do a make dist-clean, make clean, or equivalent as the first step to make sure all components are rebuilt from scratch.

Updated mpprun

Note that the mpprun application differs substantially on Rocky Linux 9 compared to CentOS 7. Use mpprun -h to see the help, and the option mpprun -i <binary> can be helpful for power users to understand more precisely how mpprun will launch a binary through the various MPI-specific launchers.

Old build environments

The old build environments of CentOS 7, buildenv-<something>/<old-version>, will not be carried over to EL9, but there will be newer version replacement modules available corresponding to them. If you can not make use of these refreshed modules, please contact NSC Support for assistance on your migration to these environments.

Application performance

If you see a significant performance loss in the upgraded part of Stratus and Cirrus, please let NSC Support know as soon as possible.

If you cannot get your application working on Rocky 9

Ask NSC Support for help. Do this early, do not wait until April 14th! <!–
If the application cannot be made to work even with assistance from NSC, we have the option to leave some non-upgraded compute nodes running for a while (but no longer than June 30th, 2024). However, only users that have asked for help and that NSC has not been able to help will be allowed to use such nodes. The number of such nodes will likely be very limited. –>

Some more technical details

Users can choose which part to use by logging in to the corresponding login node (using SSH or Thinlinc):

Upgraded/Rocky 9: stratus2.nsc.liu.se (a.k.a stratus2, available from November 13th)
Non-upgraded/CentOS 7: stratus.nsc.liu.se (a.k.a stratus1, available until January 8th)

Why are we doing this upgrade?

The current operating system CentOS 7 (which is based on RedHat Enterprise Linux 7) will not receive any security updates after 2024-06-30.

As Stratus and Cirrus are planned to continue operating longer than this, we need to upgrade the operating system.

What will happen?

What has happened?

What do I need to do?

Software migration guide

Applications provided by NSC

Compiled applications that you or your project have installed yourself on CentOS 7 to run on compute nodes

Updated mpprun

Old build environments

Application performance

If you cannot get your application working on Rocky 9

Some more technical details

Why are we doing this upgrade?

User support

Getting access

Everything OK!

Self-service

What will happen?

What has happened?

What do I need to do?

Software migration guide

Applications provided by NSC

Compiled applications that you or your project have installed yourself on CentOS 7 to run on compute nodes

Compiled applications that you or your project have installed yourself on CentOS 7 to run on the login nodes

Updated mpprun

Old build environments

Application performance

If you cannot get your application working on Rocky 9

Some more technical details

Why are we doing this upgrade?

User support

Getting access

Everything OK!

Self-service