VASP 6.1.1 with Intel MKL on AMD Rome: problem with testsuite

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
bert_tijskens
Newbie
Newbie
Posts: 7
Joined: Tue Jul 22, 2014 9:40 am
License Nr.: 5-568

VASP 6.1.1 with Intel MKL on AMD Rome: problem with testsuite

#1 Post by bert_tijskens » Tue Sep 15, 2020 1:10 pm

We built successfully VASP 6.1.1 on Broadwell and AMD Rome.

On Broadwell, we used Intel 2018 en 2019 compilers. All tests from the testsuite pass.
On AMD Rome, it’s a different story. We used Intel 2019 and Intel 2020 compilers.
The only difference between makefile.include on Broadwell and Rome: we replaced -xHOST by -march=core-avx2

The output of the first test already shows at the beginning:

WARNING: Sub-Space-Matrix is not hermitian in DAV 1

finally resulting in

BRMIX: very serious problems
the old and the new charge density differ

We also used the undocumented variable MKL_DEBUG_CPU_TYPE=5.

Do you have any idea how to fix this problem? Or how to run the testsuite successfully on AMD Rome?

merzuk.kaltak
Administrator
Administrator
Posts: 295
Joined: Mon Sep 24, 2018 9:39 am

Re: VASP 6.1.1 with Intel MKL on AMD Rome: problem with testsuite

#2 Post by merzuk.kaltak » Fri Sep 18, 2020 8:25 am

Hello,
can you upload the testsuite.log, are the errors large?
We don't have access to an AMD chip right now (might change in future).
Have you tried linking to FFTW + OpenBLAS?
This might result in an almost as fast alternative compared to MKL.

bert_tijskens
Newbie
Newbie
Posts: 7
Joined: Tue Jul 22, 2014 9:40 am
License Nr.: 5-568

Re: VASP 6.1.1 with Intel MKL on AMD Rome: problem with testsuite

#3 Post by bert_tijskens » Tue Sep 22, 2020 8:02 am

hi,
Thanks for considering this.
I uploaded the testsuite log to dropbox because it is quite large. Here is the link: https://www.dropbox.com/s/4jl0u2058zi9z ... og.gz?dl=0

merzuk.kaltak
Administrator
Administrator
Posts: 295
Joined: Mon Sep 24, 2018 9:39 am

Re: VASP 6.1.1 with Intel MKL on AMD Rome: problem with testsuite

#4 Post by merzuk.kaltak » Wed Sep 23, 2020 7:54 am

It seems your vasp binary is not properly compiled.
I suspect the reason is that you use Intel compilers on AMD hardware.
Have you considered using the gfortran (for instance gfortran-7.5.0) in combination with the MKL?
There are also other toolchains for vasp-6.1.0 we have successfully tested (so far only on Intel chips unfortunately).

bert_tijskens
Newbie
Newbie
Posts: 7
Joined: Tue Jul 22, 2014 9:40 am
License Nr.: 5-568

Re: VASP 6.1.1 with Intel MKL on AMD Rome: problem with testsuite

#5 Post by bert_tijskens » Wed Sep 23, 2020 8:43 am

Could you be a bit more specific? What makes you conclude that VASP isn't properly compiled?

merzuk.kaltak
Administrator
Administrator
Posts: 295
Joined: Mon Sep 24, 2018 9:39 am

Re: VASP 6.1.1 with Intel MKL on AMD Rome: problem with testsuite

#6 Post by merzuk.kaltak » Wed Sep 30, 2020 11:36 am

Please mind, using Intel MKL on AMD chips is "experimental" and most probably officially not supported by Intel.
That been said, there is no guarantee that the instruction set on AMD is fully compatible with the one chosen by MKL when switching on DEBUG mode.
This is what I have ment by not properly compiled. Most probably Intel MKL supports only Intel chips officially.
Alternatively, you may investigate if reducing optimization (getting rid of the avx2 support for instance in your makefile.include) would provide a working binary.

Unfortunately, we don't have AMD hardware available and thus do not have the ability to investigate your problem in detail.
If you find a solution to your problem, please post it on this thread. The community and the VASP team would appreciate that.

tobias_kloeffel
Newbie
Newbie
Posts: 2
Joined: Mon Sep 07, 2020 9:01 am

Re: VASP 6.1.1 with Intel MKL on AMD Rome: problem with testsuite

#7 Post by tobias_kloeffel » Wed Oct 14, 2020 11:12 am

Hi,

you might want to try unsetting MKL_DEBUG_CPU_TYPE and if that solves that problem, try to set it again but also set MKL_CBWR=AUTO, at least I know of one bug which is triggered without setting MKL_CBWR.

Best,
Tobias

Post Reply