A new version of VASP, denoted vasp.5.3.5 31Mar14
was released in the beginning of April. Swedish HPC users can find 5.3.5 installed on NSC’s Triolith and Matter clusters, and at PDC’s Lindgren. So what is new? The release notes on the VASP community page mentions a few new functionals (MSx family meta-GGAs, BEEF and Grimme’s D3) together with many minor changes and bug fixes.
The first installation of VASP 5.3.5 binaries at NSC is available in the expected place, so you can do the following in your job scripts:
mpprun /software/apps/vasp/5.3.5-31Mar14/default/vasp
You can also do module load vasp/5.3.5-31Mar14
, if you prefer to use modules.
The installation and compilation was straightforward with Intel’s compilers and MKL, but I did not have much success with gcc/gfortran (4.7.2) as usual. Even after applying my previous patches for gfortran, the compiled binary crashed due to numerical errors.
It is also worth mentioning that some recent MPI-libraries now assume MPI version 2.2 standards compliance by default. This is the case with e.g. Intel MPI 4.1, which we use on Triolith. Unfortunately, VASP is not fully compliant with the MPI standard, as there are places in the code where memory buffers overlap, which results in undefined behavior. You can see errors like this when running VASP:
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
...
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0xa57e9c0 src=0xa57e9c0 len=49152
Some of the problems can be alleviated by instructing the MPI runtime to assume a different MPI standard. For Intel MPI, one can set
export I_MPI_COMPATIBILITY=4
to force the same behavior as with Intel MPI 4.0. This seems to help with VASP. If we get reports of much problems like this, I will install a new version of VASP 5.3.5 with the old Intel MPI as a stopgap solution.
The Intel-compiled version of 5.3.5 ran through the test suite that I have without problems, implying that the results of 5.3.5 remain unchanged vs 5.3.3 for basic properties, as we expect. The overall performance appears unchanged for regular DFT calculations, but hybrid calculations run slightly faster now. There is also preliminary support for NPAR
for Hartree-Fock-type calculations. I played around with it using a 64-atom cell on 64 cores, but setting NPAR actually made it run slower on Triolith, so I suppose k-point parallelization is still much more efficient for hybrid calculations.