Workshop abstracts

MPI in a Multicore environment

Håkon Bugge, Scali Inc.

This talk will discuss how the longer term trends such as multicores, memory bandwidth, and transistor budget will influence computing the following years. Which role will/should the MPI layer play in this? Further, the responsibility of the MPI layer vs. other parts of the operating environment will be discussed. Special focus will be on assignment of processes to cores for a pure MPI model and for a hybrid OpenMP/MPI environment as well. Also, techniques for optimizing the execution of a single jobs vs. throughput of several jobs will be discussed

High Performance Computing in Norway

Roy Dragseth, University of Tromsø

The talk will first give a quick overview of the HPC ecosystem in Norway and an outline of the current development in our country: two new HPC systems are installed in Bergen and Tromsø, a nationwide storage system with a dedicated lambda network is under implementation and national grid infrastructure is being established.

The second part of the talk will give an outline of the design goals and targeted applications of the new 60 TFlops cluster currently being installed at the University of Tromsø.

Modelling of materials, where we were, where we are and where are we going?

Olle Eriksson, Uppsala University

In this talk I will make a quick review of where the field of materials modelling has evolved from, by giving examples of methodology and scientific questions which were of interest some 20-30 years ago. The dramatic change of the capabilities of high performance computation, both in terms of hardware as well has software has changed this field, in a way few scientific disciplines can compare with. A few examples of research activities at Uppsala university will be given, reflecting at least to some degree which direction this field is moving in. The examples involve dynamical properties of materials, biological applications and a full automatic software system with which to undertake modelling of materials properties.

Biomedical modelling and simulation: patient specific models for diagnosis

Matts Karlsson, Linköping University

One important application of computational mechanics in biomedicine deals with creating subject specific models of the human cardiovascular systems for enhanced diagnostics and intervention planning. Magnetic resonance imaging enables non-invasive time-resolved three-dimensional data including individual anatomy as well as velocities of the blood, the myocardium and vessel walls. With the use of supercomputers we are able to perform predictive simulations of the blood flow to map the wall shear stress distribution in arteries in order to understand the mechanisms behind atherosclerosis and other diseases.

Silent Corruptions

Peter Kelemen, CERN

The capacity of the CERN Computer Center is scaling up as we prepare for LHC. As we currently have more than 4 petabytes of online magnetic storage (and expecting another 1.5 petabytes before the end of the year), data integrity issues become more and more visible. Since January 2007, CERN has has been systematically collecting and analysing observations of silent data corruptions. We present the motivations, tools and current findings of the ongoing investigation.

NDGF, a Nordic Tier-1 for WLCG

Josva Kleist, Nordic DataGrid Facility

The Tier-1 facility operated by the Nordic DataGrid Facility (NDGF) differs significantly from other Tier-1s in several aspects: It is not located one or a few locations but instead distributed throughout the Nordic, it is not under the governance of a single organization but instead a "virtual" Tier-1 build out of resources under the control of a number of different national organizations.

We present the technical implications of these aspects as well as the high-level design of this distributed Tier-1. The focus will be on the challenges involved in the creation of a storage system based on dCache. dCache is well known and respected as a powerfull distributed storage resource manager, and was chosen for implementing the storage aspects of the nordic Tier 1. In contrast to classic dCache deployments, we deploy dCache over a WAN with limitted bandwith, high latency, frequent network failures, and spanning many administrative domains. These properties provide unique challenges, covering topics such as security, administration, maintenance, upgradability, reliability, and performance. Our initial focus has been on implementing the GridFTP 2 OGF recommendation in dCache and the Globus Toolkit. Compared to GridFTP 1, GridFTP 2 allows for more intelligent data flow between clients and storage pools, thus enabling more efficient use of our limitted bandwith.

Building the European High-Performance Ecosystem

Kimmo Koski, Finnish IT center (CSC)

During 2006-2007 there has been intense work in Europe in order to increase European competitiveness in high-end computing. Various new activities have been started in addition to the existing European grid projects, such as Deisa or EGEE. European Strategy Forum for Research Infrastructures (ESFRI) has published a roadmap including plans 35 new major European infrastructures, most of them requiring high end computing, data management and software development. In addition, High-Performance Computing in Europe Taskforce (HET) was established in June 2006 with a target to draft a strategy for European HPC Ecosystem focusing petaflop computing. As a result for the successful strategy work, a collaboration among 14 countries was established and a project proposal for establishing European petaflop/s centers submitted to EU. The EU project, Partnership for Advanced Computing in Europe (PACE), will start in the beginning of 2008.

Basic tool for modeling the European HPC ecosystem is the performance pyramid. One of the key arguments from HET work—later implemented in PACE proposal—is to develop the different levels in the pyramid in a balanced way: enabling sufficient top-class resources, but at the same time invest considerably in boosting the collaboration, scaling the software, building the competencies and developing the national/regional infrastructures to support in creating a competitive and sustainable European HPC service.

The talk will review the current European collaboration in HPC and future plans in that context. Nordic status and possible opportunities are discussed. Relations between infrastructure development and scientific communities requiring HPC capacity are covered with a few practical examples. Viewpoints for efficient Nordic impact in the European HPC Ecosystem are given to boost the discussion.

Scaling and Other Bad Ideas in High Performance Computing

Erik Lindahl, Stockholm University

Superficially, molecular dynamics is a very straightforward algorithm, but with complex memory access patterns and moving particles performance has historically been limited to a few percent of the theoretical hardware peak capacity. I will present our long-term efforts in resolving this problem with our GROMACS molecular simulation toolkit, in particular our strong focus on absolute simulation performance rather than steps or relative scaling - and why parallelization in some cases even can be an extremely bad idea. Much of our optimization work is generally applicable to x86 code, and I will also present a brand new domain decomposition implementation based on "neutral territory" interaction partitioning. I will also show how reviving MPMD (multiple program, multiple data) parallelization ideas has enabled us to get strong scaling of particle simulations all the way down to 200 atoms per CPU, and how to get the most out of programs like these on large dual quad-core clusters like Neolith.

News from SNAC and SweGrid

Mats Nylén, HPC2N

The new policy for the Swedish National Allocation Committee (SNAC) is a major development of the allocation process for users of HPC resurces in Sweden. I will discuss the reasons for the change of policy and the new policy itself. The consequnces for users will be specially highlighted.

SweGrid is currently going through a major upgrade, mostly due to Swedens participation in the LHC experiments in CERN. SweGrid also provides Swedens contribution to NDGF. I will discuss the current status of the upgrade and outline the future plans for SweGrid.

Hierarchic Data Structures for Sparse Matrix Representation in Large-scale DFT/HF Calculations

Pawel Salek, KTH

Hierarchic data structure for efficient representation of sparse matrices is presented. The data structure is to allow an optimal hardware use in typical matrix operations, with sparsity patterns specific to large-scale Hartree-Fock or density functional calculations. Matrices appearing in these problems are often only semi-sparse and sparsity needs to be enforced. Algorithm for systematic truncation of small elements is presented. Efficient and accurate handling of sparsity allows to achieve linear or NlogN scaling in both storage and computational time and effectively replace the diagonalization algorithms used for determination of density from a fixed Kohn-Sham/Fock matrix. The data structure is hierarchic: At lowest level, small dense submatrices are stored, enabling use of BLAS libraries for their multiplication. Higher levels form a tree for flexibility and logarithmic cost of random-matrix element access. The data structure flexibility simplifies implementation of various matrix manipulation algorithms. Example implementations of matrix-matrix multiplications and inverse Cholesky factorization are presented. OpenMP parallelization of the developed matrix library is discussed as well.

Open Architectures for High Performance Computing

Robert Starmer, Cisco Systems

High Performance Computing (HPC) has a long history of exploting the newest technologies in order to achieve the best possible performance. Using these leading-edge technologies can have great benefits, such as rapidly accelerating scientific and industry applications. But these advances come with a cost: the bugs and instabilities typical to early adopters of immature complex systems. Utilizing open source software methodologies for new advanced technologies, particularly within the context of the HPC community, has proven to be an effective accelerator to achieve stability and performance. Additionally, HPC researchers have been invaluable in providing new insights and techniques with advanced hardware because of their access to the source code. In several notable cases, open source has therefore shortened the HPC "new product hardening" cycle and been successful in promoting widespread adoption.

This talk will discuss two HPC open source software projects that have advanced through the "new and immature" phases to become stable, production-quality systems through the help of the open source community. Cisco engineers actively participate in both of these projects:

The OpenFabrics Alliance is a set of industry partners who jointly develop and maintain an open source drivers, tools, and middleware for low latency / high bandwidth InfiniBand networks.
The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI embodies a fascinating blend of leading-edge research and industry-hardened product testing methodologies.

Virtualization: wLCG use cases and review of current applicable research

Owen Synge, German Electron Synchrotron, DESY, Hamburg

This talk reports upon the Virtualization users workshop held earlier this year at DESY. wLCG use cases and applications of Virtualization on the worker node became the focus of the talk. This talk will summaries the current status of the many HPC projects related to this area both from many research projects.

IBM Blue Gene/P - An Overview of a Petaflop Capable System

Carl Tengwall, IBM

In June 26, 2007, IBM announced Blue Gene/P as the leading edge offering in its massively parallel Blue Gene supercomputer line, succeeding Blue Gene/L. When fully configured, Blue Gene/P is designed to scale to at least 262,144 (256 K) quad-processor nodes, with a peak performance of 3.56 PetaFLOP/s. This presentation describes our vision of this petascale system [i.e. a system capable of delivering over a quadrillion (1015) floating point operations per second] from the hardware point of view, i.e., it provides an overview of the system architecture, chip design, system packaging and the software infrastructure. The system software design will be covered for Blue Gene/P focusing on the areas that have changed from the Blue Gene/L software design, and also our on-going directions of research for Blue Gene software.

High Performance Computing at HP

Martin Walker, Hewlett-Packard

This talk will make some observations on the size, shape, and dynamics of the high-performance computing market, then describe the view from HP on meeting the challenges that arise from this analysis. The talk will conclude with some remarks on facing the future of HPC.

info@nsc.liu.se