VASP 6.3.0 compiles but fais some validation tests

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.


Moderators: Global Moderator, Moderator

Message
Author
chunsheng_wang
Newbie
Newbie
Posts: 3
Joined: Thu Feb 10, 2022 12:33 am

VASP 6.3.0 compiles but fais some validation tests

#1 Post by chunsheng_wang » Fri Feb 11, 2022 12:01 am

I am trying to compile VASP 6.3.0_0 on an HPC cluster, and although I can get the compilation to succeed, I am encountering issues (segfaults) in the validation tests.

The cluster nodes have dual Intel Ivy Bridge E5-2680v2 chips and 128 GB of RAM. Compiled with Intel Parallel Studio Xe 2020 Update 1 cluster edition, using the included MKL for BLAS, LaPACK, FFTW, ScaLAPACK and the included Intel MPI libraries.

The validation tests
NiOsLDAU=2_x
NiOsLDAU=2_x_RPR
NiOsLDAU=2_y
NiOsLDAU=2_y_RPR
NiOsLDAU=2_z
NiOsLDAU=2_z_RPR
SiC8_GW0R
Tl_x
Tl_x_RPR
Tl_y
Tl_y_RPR
Tl_z
Tl_z_RPR
are failing, I believe with segfaults.

For running the tests, I am using make test with
nthrds=4
nranks=2
mpi_flags="-np $nranks -ppn $nranks"
omp_flags="-genv OMP_NUM_THREADS=$nthrds -genv OMP_STACKSIZE=512m"

export VASP_TESTSUITE_EXE_STD="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_STD}"
export VASP_TESTSUITE_EXE_GAM="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_GAM}"
export VASP_TESTSUITE_EXE_NCL="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_NCL}"

as suggested by the impi+omp.conf


where GLUEVASP_STD/GAM/NCL point to the vasp_std/gam/ncl executables in the build directory

Attached are tarballs with makefile.include, testsuite.log and the test/* directories for the failed tests (except for SiC8_GW0R which was too large to attach)

Any assistance you can offer with this would be appreciated
You do not have the required permissions to view the files attached to this post.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 473
Joined: Mon Nov 04, 2019 12:44 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#2 Post by ferenc_karsai » Fri Feb 11, 2022 1:50 pm

I've checked this calculations with all of our compilers. We also continuously test the testsuite. I see no problems in our calculations, so most likely your toolchain has a problem.

Very often Scalapack and shared memory for MPI are sources of problems. In your compiling I didn't see shared memory so we can rule that out. But you used Scalapack. So please try to compile without Scalapack and see if the problem persists. For that please remove "-DscaLAPACK" from the "CPP_OPTIONS" in your makefile.include.

Please also compile with "-traceback -debug -g". It maybe gives useful information, since it prints out the line where the code crashes.

hszhao.cn@gmail.com
Full Member
Full Member
Posts: 200
Joined: Tue Oct 13, 2020 11:32 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#3 Post by hszhao.cn@gmail.com » Tue Feb 15, 2022 2:54 pm

Please also compile with "-traceback -debug -g". It maybe gives useful information, since it prints out the line where the code crashes.
What do you mean by saying the following?

Code: Select all

compile with "-traceback -debug -g"
I checked the GNU Make options, and only can find the following most relevant options similar to your above-mentioned ones:

Code: Select all

       --debug[=FLAGS]
            Print  debugging  information  in addition to normal processing.  If the FLAGS are omitted, then the behavior is the
            same as if -d was specified.  FLAGS may be a for all debugging output (same as using -d), b for basic  debugging,  v
            for  more  verbose basic debugging, i for showing implicit rules, j for details on invocation of commands, and m for
            debugging while remaking makefiles.  Use n to disable all previous debugging flags.

      --trace
            Information  about  the disposition of each target is printed (why the target is being rebuilt and what commands are
            run to rebuild it).
Regards,
HZ

ferenc_karsai
Global Moderator
Global Moderator
Posts: 473
Joined: Mon Nov 04, 2019 12:44 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#4 Post by ferenc_karsai » Tue Feb 15, 2022 4:17 pm

These options are for the intel compiler. I've written them because I saw you compiled before with intel.

For GNU use the following:
-fbacktrace -g -debug

hszhao.cn@gmail.com
Full Member
Full Member
Posts: 200
Joined: Tue Oct 13, 2020 11:32 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#5 Post by hszhao.cn@gmail.com » Wed Feb 16, 2022 2:19 am

Thank you for your clarification. Here, I will provide some further explanations for Intel compiler on this issue for others' reference.


For understanding the precise meaning of "-traceback -debug -g", see the following built-in help of ifort:

Code: Select all

$ ifort --help |grep -A3 traceback$
-[no]traceback
          specify whether the compiler generates PC correlation data used to
          display a symbolic traceback rather than a hexadecimal traceback at
          runtime failure


$ ifort --help |grep -A5 -- '-debug \[' 
-debug [keyword]
          Control the emission of debug information.
          Valid [keyword] values:
             none
                 Disables debug generation.


$ ifort --help |grep -A6 -- '-g\[level\]' 
-g[level]
          Produce symbolic debug information.
          Valid [level] values:
             0  - Disable generation of symbolic debug information.
             1  - Emit minimal debug information for performing stack traces.
             2  - Emit complete debug information. (default for -g)
             3  - Emit extra information which may be useful for some tools.
So, "-traceback -debug -g" should mean the following directives:
  • Specify the compiler generates PC correlation data used to display a symbolic traceback rather than a hexadecimal traceback at runtime failure.
  • Disables debug generation.
  • Emit complete debug information.
So, basically, your suggestion is to add the above option to the debug variable in makefile.include, as shown below:

Code: Select all

DEBUG       = -O0 -traceback -debug -g
Also see some suggestions [here]https://www.nas.nasa.gov/hecc/support/k ... ns_92.html[/url].

Regards,
HZ

ferenc_karsai
Global Moderator
Global Moderator
Posts: 473
Joined: Mon Nov 04, 2019 12:44 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#6 Post by ferenc_karsai » Wed Feb 16, 2022 8:31 am

DEBUG is not automatically used, better append it to FFLAGS.

hszhao.cn@gmail.com
Full Member
Full Member
Posts: 200
Joined: Tue Oct 13, 2020 11:32 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#7 Post by hszhao.cn@gmail.com » Wed Feb 16, 2022 9:46 am

DEBUG is not automatically used, better append it to FFLAGS.
Thanks for your advice. Now, I inserted the following line in makefile.include, which is located after the initial value setting of FFLAGS:

Code: Select all

FFLAGS      += -traceback -debug -g
Very often Scalapack and shared memory for MPI are sources of problems. In your compiling I didn't see shared memory so we can rule that out. But you used Scalapack. So please try to compile without Scalapack and see if the problem persists. For that please remove "-DscaLAPACK" from the "CPP_OPTIONS" in your makefile.include.
I'm still a little confused about your description above. More specifically, do you mean the following Makefile configuration modification?

1. If I use the makefile.include.intel based Makefile, "-DscaLAPACK" should be preserved.
2. If I use the makefile.include.intel_omp or makefile.include.intel_ompi_mkl_omp based Makefiles, "-DscaLAPACK" should be removed.

Am I right? Any more hints will be highly appreciated.

Regards,
HZ

ferenc_karsai
Global Moderator
Global Moderator
Posts: 473
Joined: Mon Nov 04, 2019 12:44 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#8 Post by ferenc_karsai » Wed Feb 16, 2022 11:21 am

No, what I meant is for narrowing down the error compile without "-DscaLAPACK." That can be done with any compiler. If the code works without scaLAPACK, but not with, then we know the error is in your scaLAPACK setup.

hszhao.cn@gmail.com
Full Member
Full Member
Posts: 200
Joined: Tue Oct 13, 2020 11:32 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#9 Post by hszhao.cn@gmail.com » Wed Feb 16, 2022 11:23 am

ferenc_karsai wrote: Fri Feb 11, 2022 1:50 pm I've checked this calculations with all of our compilers. We also continuously test the testsuite. I see no problems in our calculations, so most likely your toolchain has a problem.
Could you please share the full content of your makefile.include?

Regards,
HZ

chunsheng_wang
Newbie
Newbie
Posts: 3
Joined: Thu Feb 10, 2022 12:33 am

Re: VASP 6.3.0 compiles but fais some validation tests

#10 Post by chunsheng_wang » Wed Feb 16, 2022 2:08 pm

Ferenc and VASP people: FYI, there are two people having this issue on this ticket. I am the creator of this ticket, and someone else (not directly working with me) has also posted. Due to the time needed for compile/test cycles and other commitments I only just now am replying to the initial post.

I have rebuilt VASP and rerun with the debugging flags and scaLAPACK disabled. I have also disabled hdf5 and wannier90 just to turn off as much extraneous stuff as possible.

Tests HEG_333_LW, SiC8_GW0R, and SiC_ACFDTR_T complain about the lack of scaLAPACK and are listed as failed, but I am assuming that is normal (as we turned off scaLAPACK).

Tests Tl_x, Tl_x_RPR, Tl_y, Tl_y_RPR, Tl_z, and Tl_z_RPR are segfaulting.

I have attached the makefile.include, testsuite.log, and test/Tl_* directories in attached tarball
vasptest.tar.gz
(I had a little trouble with the requested debug flags the first time around, so I put them all over the place in the current makefile.include just to make sure they took effect)

At this point, I believe the entire toolchain is within the Intel Parallel Studio Suite compiler + MKL (version 2020.1)

Any assistance you can provide regarding/resolving these issues with the validation tests will be appreciated. Thank you in advance.
You do not have the required permissions to view the files attached to this post.

hszhao.cn@gmail.com
Full Member
Full Member
Posts: 200
Joined: Tue Oct 13, 2020 11:32 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#11 Post by hszhao.cn@gmail.com » Thu Feb 17, 2022 10:21 am

I think the culprit presumably related to the following setting in your makefile.include:

Code: Select all

FFLAGS     += -xHOST
All the failed tests mentioned by you have passed on my machine (Ubuntu 20.04.3 LTS with dual Core Intel Xeon E5-2699 v4). See the following for more details on the toolchain, makefile.include, and the testsuite.log file.

1. The tool chains are the recent versions of Intel oneAPI base and hpc toolkits:

Code: Select all

$ module purge
$ module load mpi/2021.4.0 mkl compiler
$ module list 

Currently Loaded Modules:
  1) mpi/2021.4.0   3) compiler-rt/2022.0.2   5) oclfpga/2022.0.2
  2) tbb/2021.5.1   4) mkl/2022.0.2           6) compiler/2022.0.2
2. The content of the makefile.include is as follows:

Code: Select all

$ egrep -v '^(#|$)' makefile.include.intel 
CPP_OPTIONS = -DHOST=\"LinuxIFC\" \
              -DMPI -DMPI_BLOCK=8000 -Duse_collective \
              -DscaLAPACK \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Duse_bse_te \
              -Dtbdyn \
              -Dfock_dblbuf
CPP         = fpp -f_com=no -free -w0  $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)
FC          = mpiifort
FCL         = mpiifort
FREE        = -free -names lowercase
FFLAGS      = -assume byterecl -w
OFLAG       = -O2
OFLAG_IN    = $(OFLAG)
DEBUG       = -O0
OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o
CPP_LIB     = $(CPP)
FC_LIB      = $(FC)
CC_LIB      = icc
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1
FREE_LIB    = $(FREE)
OBJECTS_LIB = linpack_double.o
CXX_PARS    = icpc
LLIBS       = -lstdc++
FFLAGS      += -march=core-avx2
FFLAGS      += -traceback -debug -g
LLIBS       += -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl
FCL         += -qmkl=parallel
INCS         =-I$(MKLROOT)/include/fftw
Instead of using , the following option is used based on the suggestion here:

Code: Select all

FFLAGS      += -march=core-avx2
Side remark: Base on my testing, the following Intel MPI Library doesn't work,

mpi/2021.5.0, i.e., mpi/2021.5.1

Regards,
HZ
You do not have the required permissions to view the files attached to this post.

chunsheng_wang
Newbie
Newbie
Posts: 3
Joined: Thu Feb 10, 2022 12:33 am

Re: VASP 6.3.0 compiles but fais some validation tests

#12 Post by chunsheng_wang » Fri Feb 18, 2022 7:53 pm

@hszhao.cn: Thank you. The +xHOST flag was indeed the issue. After replacing with the appropriate -march flag (are cluster is a bit too old to support AVX2:), the tests all pass. I am surprised that that is the cuplrit, I thought +xHOST just instructed the compiler to produce code to optimize/run on the processor being used for compilation, and I compiled on a system with the same processor as the test was run on, but the suggested modification worked. Thank you again for all your assistance.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 473
Joined: Mon Nov 04, 2019 12:44 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#13 Post by ferenc_karsai » Fri Feb 18, 2022 10:28 pm

Hszhao, thank you very much for helping us find the problem in your compilations.

hszhao.cn@gmail.com
Full Member
Full Member
Posts: 200
Joined: Tue Oct 13, 2020 11:32 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#14 Post by hszhao.cn@gmail.com » Sat Feb 19, 2022 2:57 pm

Some tricks for setting the value of -march.

1. Obtain the arch name as follows:

Code: Select all

$ gcc -march=native -Q --help=target|grep -- '^[ ]*-march='
  -march=                     		broadwell
Then based on the intel official document here, the following should be used:

Code: Select all

FFLAGS      += -march=broadwell
2. If your arch/processor name is not listed in the intel official document here, just use the following trick as commented here:

Code: Select all

FFLAGS    += -march=native
I’ve confirmed that both of the above two settings can solve the problem discussed here.

Regards,
HZ

hszhao.cn@gmail.com
Full Member
Full Member
Posts: 200
Joined: Tue Oct 13, 2020 11:32 pm

Re: VASP 6.3.0 compiles but fais some validation tests

#15 Post by hszhao.cn@gmail.com » Mon Feb 21, 2022 2:35 pm

Using the following environment: Ubuntu 20.04.3 LTS installed on a dual Intel Xeon E5-2699 v4 CPUs machine, I recompiled vasp.6.3.0 using the -xHost option, and then validated all selected tests in the fast category successfully on the same machine. The following components of the Intel oneAPI BASE and HPC toolkits are used:

Code: Select all

$ module load compiler mkl mpi/2021.4.0
$ module list 

Currently Loaded Modules:
  1) lmod           3) compiler-rt/2022.0.2   5) compiler/2022.0.2   7) mpi/2021.4.0
  2) tbb/2021.5.1   4) oclfpga/2022.0.2       6) mkl/2022.0.2
Attached are the related makefile.include and testsuite.log files. So, I conclude that if you compile and run vasp on the exactly same CPU architectures, -xHost should work, otherwise, use appropriate -march compiler option for cross-compilation. You can see related discussions here.

Regards,
HZ
You do not have the required permissions to view the files attached to this post.

Post Reply