Working effectively with HPC systems

The seminar will present useful tools and best practices for working effectively on HPC systems. This will among other things cover methods and skills to help you use allocated resources effectively. It is expected to be of interest for a general HPC system user, both at a more familiar (intermediate) or starting (beginner) level.

Place: Online via Zoom
time: 10:00 - 15:00, Tue 20 Apr, 2021 (moved from Spring 2020)

Registration

Introduction

Working efficiently with HPC starts with the tools you use to interact with the HPC system. It is also helpful to understand the general anatomy of HPC systems and storage. Following on from these fundamentals, we will give some recommendations for data organization on the system and examples of various types of file systems (e.g. parallel vs. local) and their individual strengths and weaknesses. We will then discuss the concepts of parallelism, scalability, scheduling and what types of OS and software you can expect of HPC systems. We will go through some important things to consider when building and installing software. Finally, we will look at different ways of running software on HPC systems and ways to monitor your software as it is running, with the aim of ensuring that your jobs are not poorly configured or wasting resources.

While the content and the practices are useful for HPC systems in general, we will show examples and tools more specific for the NSC clusters, e.g. Tetralith and Sigma.

Schedule

The schedule for the day is divided into two main parts, before and after lunch break. The parts include several blocks of ca. 30 minutes with optional breaks in between. There will be opportunities for questions. Depending on time, the day might end with a longer questions session. The times are approximate (except lunch time break).

10:00 -10:10 Introduction
10:10 -10:45 Tools at your End
10:45 -11:20 HPC System Anatomy & Storage
11:20 -12:00 Concept of Parallelism
12:00 -13:00 L u n c h
13:00 -13:30 Software at HPC systems
13:30 -14:00 Ideas and Strategies for Organizing your Workflow
14:00 -14:30 Interacting with the Slurm Queueing System
14:30 -ca15:00 Practical Examples and Discussion \

Link to Q&A document from the sessions.

Topics/blocks

Welcome, introductions and practicalities
Tools at your end (e.g. terminal, ssh config., file transfer tools, VNC)
HPC system anatomy (login and compute nodes, interconnect, storage)
Properties and features of storage areas (e.g. quotas, performance, locality, backups, snapshots, scratch)
Concept of parallelism (Amdahl’s law), scalability, scheduling and practical advice for good performance
Software on an HPC system (OS, modules, python envs., concept of build envs., containers with Singularity)
Ideas and strategies for organizing your workflow (data and file management, traceability and reproducibility)
Interacting with the Slurm queueing system (requesting resources interactively or in batch)
Practical examples (preparing, submitting, monitoring and evaluating job efficiency)

Materials

The presentations and collected Q&A from the sessions are available as pdf:s, see links in the section “Schedule” above. The event is not recorded (though possibly, presentations will be recorded at a different time).

Organizers

Weine Olovsson and Hamish Struthers will present, Peter Kjellström, Torben Rasmussen and Wei Zhang will also help out during the sessions.

Venue

Online seminar via Zoom, access link will be sent out to registered participants before the event.

General contact

Phone: +46 (0)73 461 8948 (Weine Olovsson)
E-mail: weiol@nsc.liu.se

Working effectively with HPC systems

Registration

Introduction

Schedule

Topics/blocks

Materials

Organizers

Venue

General contact

User support

Getting access

Everything OK!

Self-service