Previously, I showed how SCALAPACK is a limiting factor in the parallel scaling of VASP. VASP 5.3.2 introduced support for the ELPA library, which can now be enabled in the subspace rotation phase of the program. You do this by compiling with the “-DELPA” preprocessor flag. In the VASP makefiles, there is a variable called CPP where this flag can be added:
CPP = $(CPP_) -DHOST=\"NSC-ELPATEST-B01\" -DMPI -DELPA \
...
In addition, you need to get access to ELPA (by registering on their site) and add the source files to the makefile. I did like this:
ELPA = elpa1.o elpa2.o elpa2_kernels.o
vasp: $(ELPA) $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp main.o $(ELPA) $(SOURCE) $(FFT3D) $(LIB) $(LINK)
The ELPA developers recommend that you compile with “-O3” and full SSE support, so I put these special rules in the end of the makefile.
# ELPA rules
elpa1.o : elpa1.f90
$(FC) $(FFLAGS) -O3 -xavx -c $*$(SUFFIX)
elpa2.o : elpa2.f90
$(FC) $(FFLAGS) -O3 -xavx -c $*$(SUFFIX)
elpa2_kernels.o : elpa2_kernels.f90
$(FC) $(FFLAGS) -O3 -xavx -c $*$(SUFFIX)
(Here, -xavx optimizes for Triolith with Sandy Bridge cpu:s.)
With this procedure, I was able to compile VASP with ELPA support. As far as I can see, there is no visual confirmation of ELPA being used in OUTCAR file or stdout. It looks like the regular VASP, but with some decimal fluctuations. I also saw some crashes when running on just a few nodes (< 4). Perhaps ELPA is not as robust in this case, since it is not the intended scenario of usage.
(Benchmarks of ELPA on Lindgren and Triolith will follow in the next post.)