Working effectively with HPC systems

The seminar will present useful tools and best practices for working effectively on HPC systems. This will among other things cover methods and skills to help you use allocated resources effectively. It is expected to be of interest for a general HPC system user, both at a more familiar (intermediate) or starting (beginner) level.

Place: Online via Zoom
time: 10:00 - 15:00, Spring TBD, 2021.

Registration

Register at the corresponding PRACE training web page.

Introduction

Working efficiently with HPC starts with the tools you use to interact with the HPC system. It is also helpful to understand the general anatomy of HPC systems and storage. Following on from these fundamentals, we will give some recommendations for data organization on the system and examples of various types of file systems (e.g. parallel vs. local) and their individual strengths and weaknesses. We will then discuss the concepts of parallelism, scalability, scheduling and what types of OS and software you can expect of HPC systems. We will go through some important things to consider when building and installing software. Finally, we will look at different ways of running software on HPC systems and ways to monitor your software as it is running, with the aim of ensuring that your jobs are not poorly configured or wasting resources.

While the content and the practices are useful for HPC systems in general, we will show examples and tools more specific for the NSC clusters, e.g. Tetralith and Sigma.

Schedule

The schedule for the day is divided into two main parts, before and after lunch break. The parts include several blocks of 20-40 minutes with breaks in between. Each block will include opportunities for questions.

10:00 -12:00 Part I
12:00 -13:00 L u n c h
13:00 -15:00 Part II

Topics/blocks (preliminary)

  • Welcome, introductions and practicalities
  • Tools at your end (e.g. terminal, ssh config., file transfer tools, VNC)
  • HPC system anatomy (login and compute nodes, interconnect, storage)
  • Properties and features of storage areas (e.g. quotas, performance, locality, backups, snapshots, scratch)
  • Concept of parallelism (Amdahl’s law), scalability, scheduling and practical advice for good performance
  • Software on an HPC system (OS, modules, python envs., concept of build envs., containers with Singularity)
  • Ideas and strategies for organizing your workflow (data and file management, traceability and reproducibility)
  • Interacting with the Slurm queueing system (requesting resources interactively or in batch)
  • Practical examples (preparing, submitting, monitoring and evaluating job efficiency)

Presenters

Peter Kjellström, Weine Olovsson, Torben Rasmussen, Hamish Struthers (NSC).

Venue

Online seminar via Zoom, access link will be sent out to registered participants before the event.

General contact

Phone: +46 (0)73 461 8948 (Weine Olovsson)
E-mail: weiol@nsc.liu.se


User Area

User support

Guides, documentation and FAQ.

Getting access

Applying for projects and login accounts.

System status

Everything OK!

No reported problems

Self-service

SUPR
NSC Express