In June, I attended the International Supercomputing Conference in Leipzig, Germany. ISC is the second largest conference in the high performance computing field. Attending the big supercomputing conferences is always a good time to meditate on the state of the field and the future.
Instead of starting from the hardware point of view, I would like to begin from the other end: the scientific needs and how we can deliver more high-performance computing. At NSC, we believe in the usefulness of high-performance computing (HPC). So do our users, judging from the amount of computer time being applied for by Swedish scientists. When we compile the statistics, about three times more resources are asked for then we can provide. Clearly, the only way we can meet the need is to deliver more HPC capability in the future. The question, then, is how. There is always the possibility of increased funding for buying computers. Within our current facilities, we could accommodate many more systems of Triolith’s size, but realistically, I do not see funding for HPC systems increasing manyfold over the coming years, even though the potential benefits are great (see for example the recent report on e-infrastructure from the Swedish Science Council).
The traditional way has rather been to rely on new technology to bring us more compute power for the same amount of money. The goal is better price/performance, or more compute with the same amount of energy, which is related to the former. Fortunately, that approach has historically been very successful. Over the years, we have seen a steady stream of higher clocked CPU cores, multi-core servers, better memory bandwidth, and lower-latency networks being introduced. Each time we installed a new machine, our users could count on noticeable performance improvements, using the same simulation software as before, sometimes without even changing the underlying source code at all.
Thus, for a long time, the performance improvements have been essentially for free for our HPC users. I suspect, though, that this is a luxury that will come to an end. Why? Because currently, the way forward to more cost-effective computing, as envisioned by the HPC community, is:
- Many-core architectures, such as IBM’s BlueGene and Intel’s Xeon Phi processors.
- Vectorization, such as computing on GPU:s or with SIMD processors .
- Special purpose hardware, such as custom SoC’s, FPGA:s, and ASICs.
Usually, such technologies are mentioned in the context of exascale computing, but it is important to realize that we would have to use the same technology if we wanted to build a smaller supercomputer, but for a fraction of the current cost. More concretely, what could happen in the coming years, is that there will be a new cluster with maybe ten times the floating point capability of today, but in the form of compute nodes with e.g. 256 cores and as much as 1000 threads. The key point, though, is that the speed of an individual core will most likely be less than on our current clusters. Thus, to actually get better performance out it, you will need excellent parallel scalability just to fully use a single compute node. The problem is that: 1) there are few mass market codes today that have this kind of scalability 2) Many current scientific models are simply not meaningful to run at the scale required to fully utilize such a machine.
In such a scenario, we could arrive at a “peak-VASP” situation, where traditional HPC algorithms, such as dense linear algebra operations and fast Fourier transforms, simply will not run any faster on the new hardware, which would essentially halt what has so far been seen as natural speed development. This could happen before any end of Moore’s law comes into play. It makes me think that there might be trouble ahead for traditional electronic structure calculations based on DFT unless there is a concerted effort to move to new hardware architectures. (This is also one of the conclusions in the Science Council report mentioned earlier.)
So what could such an effort look like?
- Putting resources into code optimization and parallelization is one obvious approach. SeRC and the application experts in the Swedish e-science community have been very active in this domain. There is clearly potential here, but my understanding is that it has always been difficult to get it done in practice due to the funding structure in science (you get money for doing science, not “IT”), and also personnel shortage in the case that you actually have the funding available. There is also a limit to how much you can parallelize, with diminishing returns as you put more effort into it. So I think it can only be part of the solution.
- Changing to scientific models that are more suitable to large-scale computations should also be considered, as this would amplify the results of the work done under point #1. This is something that has to be initiated from within the science community itself. There might be other methods to attack electronic structure problems which would be inconceivable at a smaller scale, but competitive at a large scale. I think the recent resurgence of interest in Monte Carlo and configuration-interaction methods is a sign of this development.
- In the cases where models cannot be changed, the hardware itself has to change. That means bigger awareness from funding agencies of the importance of high-throughput computing. By now, computational material science is a quite mature field, and perhaps the challenge today is not only to simulate bigger systems, but rather to apply it in a large scale, which could mean running millions of jobs instead of one job with a million cores. The insight that this can produce just as valuable science as big MPI-parallel calculations could open up for investments in new kinds of computational resources.