VASP 6.3.0 compiles but fais some validation tests
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 3
- Joined: Thu Feb 10, 2022 12:33 am
VASP 6.3.0 compiles but fais some validation tests
I am trying to compile VASP 6.3.0_0 on an HPC cluster, and although I can get the compilation to succeed, I am encountering issues (segfaults) in the validation tests.
The cluster nodes have dual Intel Ivy Bridge E5-2680v2 chips and 128 GB of RAM. Compiled with Intel Parallel Studio Xe 2020 Update 1 cluster edition, using the included MKL for BLAS, LaPACK, FFTW, ScaLAPACK and the included Intel MPI libraries.
The validation tests
NiOsLDAU=2_x
NiOsLDAU=2_x_RPR
NiOsLDAU=2_y
NiOsLDAU=2_y_RPR
NiOsLDAU=2_z
NiOsLDAU=2_z_RPR
SiC8_GW0R
Tl_x
Tl_x_RPR
Tl_y
Tl_y_RPR
Tl_z
Tl_z_RPR
are failing, I believe with segfaults.
For running the tests, I am using make test with
nthrds=4
nranks=2
mpi_flags="-np $nranks -ppn $nranks"
omp_flags="-genv OMP_NUM_THREADS=$nthrds -genv OMP_STACKSIZE=512m"
export VASP_TESTSUITE_EXE_STD="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_STD}"
export VASP_TESTSUITE_EXE_GAM="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_GAM}"
export VASP_TESTSUITE_EXE_NCL="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_NCL}"
as suggested by the impi+omp.conf
where GLUEVASP_STD/GAM/NCL point to the vasp_std/gam/ncl executables in the build directory
Attached are tarballs with makefile.include, testsuite.log and the test/* directories for the failed tests (except for SiC8_GW0R which was too large to attach)
Any assistance you can offer with this would be appreciated
The cluster nodes have dual Intel Ivy Bridge E5-2680v2 chips and 128 GB of RAM. Compiled with Intel Parallel Studio Xe 2020 Update 1 cluster edition, using the included MKL for BLAS, LaPACK, FFTW, ScaLAPACK and the included Intel MPI libraries.
The validation tests
NiOsLDAU=2_x
NiOsLDAU=2_x_RPR
NiOsLDAU=2_y
NiOsLDAU=2_y_RPR
NiOsLDAU=2_z
NiOsLDAU=2_z_RPR
SiC8_GW0R
Tl_x
Tl_x_RPR
Tl_y
Tl_y_RPR
Tl_z
Tl_z_RPR
are failing, I believe with segfaults.
For running the tests, I am using make test with
nthrds=4
nranks=2
mpi_flags="-np $nranks -ppn $nranks"
omp_flags="-genv OMP_NUM_THREADS=$nthrds -genv OMP_STACKSIZE=512m"
export VASP_TESTSUITE_EXE_STD="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_STD}"
export VASP_TESTSUITE_EXE_GAM="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_GAM}"
export VASP_TESTSUITE_EXE_NCL="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_NCL}"
as suggested by the impi+omp.conf
where GLUEVASP_STD/GAM/NCL point to the vasp_std/gam/ncl executables in the build directory
Attached are tarballs with makefile.include, testsuite.log and the test/* directories for the failed tests (except for SiC8_GW0R which was too large to attach)
Any assistance you can offer with this would be appreciated
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: VASP 6.3.0 compiles but fais some validation tests
I've checked this calculations with all of our compilers. We also continuously test the testsuite. I see no problems in our calculations, so most likely your toolchain has a problem.
Very often Scalapack and shared memory for MPI are sources of problems. In your compiling I didn't see shared memory so we can rule that out. But you used Scalapack. So please try to compile without Scalapack and see if the problem persists. For that please remove "-DscaLAPACK" from the "CPP_OPTIONS" in your makefile.include.
Please also compile with "-traceback -debug -g". It maybe gives useful information, since it prints out the line where the code crashes.
Very often Scalapack and shared memory for MPI are sources of problems. In your compiling I didn't see shared memory so we can rule that out. But you used Scalapack. So please try to compile without Scalapack and see if the problem persists. For that please remove "-DscaLAPACK" from the "CPP_OPTIONS" in your makefile.include.
Please also compile with "-traceback -debug -g". It maybe gives useful information, since it prints out the line where the code crashes.
-
- Full Member
- Posts: 203
- Joined: Tue Oct 13, 2020 11:32 pm
Re: VASP 6.3.0 compiles but fais some validation tests
What do you mean by saying the following?Please also compile with "-traceback -debug -g". It maybe gives useful information, since it prints out the line where the code crashes.
Code: Select all
compile with "-traceback -debug -g"
Code: Select all
--debug[=FLAGS]
Print debugging information in addition to normal processing. If the FLAGS are omitted, then the behavior is the
same as if -d was specified. FLAGS may be a for all debugging output (same as using -d), b for basic debugging, v
for more verbose basic debugging, i for showing implicit rules, j for details on invocation of commands, and m for
debugging while remaking makefiles. Use n to disable all previous debugging flags.
--trace
Information about the disposition of each target is printed (why the target is being rebuilt and what commands are
run to rebuild it).
HZ
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: VASP 6.3.0 compiles but fais some validation tests
These options are for the intel compiler. I've written them because I saw you compiled before with intel.
For GNU use the following:
-fbacktrace -g -debug
For GNU use the following:
-fbacktrace -g -debug
-
- Full Member
- Posts: 203
- Joined: Tue Oct 13, 2020 11:32 pm
Re: VASP 6.3.0 compiles but fais some validation tests
Thank you for your clarification. Here, I will provide some further explanations for Intel compiler on this issue for others' reference.
For understanding the precise meaning of "-traceback -debug -g", see the following built-in help of ifort:
So, "-traceback -debug -g" should mean the following directives:
Also see some suggestions [here]https://www.nas.nasa.gov/hecc/support/k ... ns_92.html[/url].
Regards,
HZ
For understanding the precise meaning of "-traceback -debug -g", see the following built-in help of ifort:
Code: Select all
$ ifort --help |grep -A3 traceback$
-[no]traceback
specify whether the compiler generates PC correlation data used to
display a symbolic traceback rather than a hexadecimal traceback at
runtime failure
$ ifort --help |grep -A5 -- '-debug \['
-debug [keyword]
Control the emission of debug information.
Valid [keyword] values:
none
Disables debug generation.
$ ifort --help |grep -A6 -- '-g\[level\]'
-g[level]
Produce symbolic debug information.
Valid [level] values:
0 - Disable generation of symbolic debug information.
1 - Emit minimal debug information for performing stack traces.
2 - Emit complete debug information. (default for -g)
3 - Emit extra information which may be useful for some tools.
- Specify the compiler generates PC correlation data used to display a symbolic traceback rather than a hexadecimal traceback at runtime failure.
- Disables debug generation.
- Emit complete debug information.
Code: Select all
DEBUG = -O0 -traceback -debug -g
Regards,
HZ
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: VASP 6.3.0 compiles but fais some validation tests
DEBUG is not automatically used, better append it to FFLAGS.
-
- Full Member
- Posts: 203
- Joined: Tue Oct 13, 2020 11:32 pm
Re: VASP 6.3.0 compiles but fais some validation tests
Thanks for your advice. Now, I inserted the following line in makefile.include, which is located after the initial value setting of FFLAGS:DEBUG is not automatically used, better append it to FFLAGS.
Code: Select all
FFLAGS += -traceback -debug -g
I'm still a little confused about your description above. More specifically, do you mean the following Makefile configuration modification?Very often Scalapack and shared memory for MPI are sources of problems. In your compiling I didn't see shared memory so we can rule that out. But you used Scalapack. So please try to compile without Scalapack and see if the problem persists. For that please remove "-DscaLAPACK" from the "CPP_OPTIONS" in your makefile.include.
1. If I use the makefile.include.intel based Makefile, "-DscaLAPACK" should be preserved.
2. If I use the makefile.include.intel_omp or makefile.include.intel_ompi_mkl_omp based Makefiles, "-DscaLAPACK" should be removed.
Am I right? Any more hints will be highly appreciated.
Regards,
HZ
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: VASP 6.3.0 compiles but fais some validation tests
No, what I meant is for narrowing down the error compile without "-DscaLAPACK." That can be done with any compiler. If the code works without scaLAPACK, but not with, then we know the error is in your scaLAPACK setup.
-
- Full Member
- Posts: 203
- Joined: Tue Oct 13, 2020 11:32 pm
Re: VASP 6.3.0 compiles but fais some validation tests
Could you please share the full content of your makefile.include?ferenc_karsai wrote: ↑Fri Feb 11, 2022 1:50 pm I've checked this calculations with all of our compilers. We also continuously test the testsuite. I see no problems in our calculations, so most likely your toolchain has a problem.
Regards,
HZ
-
- Newbie
- Posts: 3
- Joined: Thu Feb 10, 2022 12:33 am
Re: VASP 6.3.0 compiles but fais some validation tests
Ferenc and VASP people: FYI, there are two people having this issue on this ticket. I am the creator of this ticket, and someone else (not directly working with me) has also posted. Due to the time needed for compile/test cycles and other commitments I only just now am replying to the initial post.
I have rebuilt VASP and rerun with the debugging flags and scaLAPACK disabled. I have also disabled hdf5 and wannier90 just to turn off as much extraneous stuff as possible.
Tests HEG_333_LW, SiC8_GW0R, and SiC_ACFDTR_T complain about the lack of scaLAPACK and are listed as failed, but I am assuming that is normal (as we turned off scaLAPACK).
Tests Tl_x, Tl_x_RPR, Tl_y, Tl_y_RPR, Tl_z, and Tl_z_RPR are segfaulting.
I have attached the makefile.include, testsuite.log, and test/Tl_* directories in attached tarball (I had a little trouble with the requested debug flags the first time around, so I put them all over the place in the current makefile.include just to make sure they took effect)
At this point, I believe the entire toolchain is within the Intel Parallel Studio Suite compiler + MKL (version 2020.1)
Any assistance you can provide regarding/resolving these issues with the validation tests will be appreciated. Thank you in advance.
I have rebuilt VASP and rerun with the debugging flags and scaLAPACK disabled. I have also disabled hdf5 and wannier90 just to turn off as much extraneous stuff as possible.
Tests HEG_333_LW, SiC8_GW0R, and SiC_ACFDTR_T complain about the lack of scaLAPACK and are listed as failed, but I am assuming that is normal (as we turned off scaLAPACK).
Tests Tl_x, Tl_x_RPR, Tl_y, Tl_y_RPR, Tl_z, and Tl_z_RPR are segfaulting.
I have attached the makefile.include, testsuite.log, and test/Tl_* directories in attached tarball (I had a little trouble with the requested debug flags the first time around, so I put them all over the place in the current makefile.include just to make sure they took effect)
At this point, I believe the entire toolchain is within the Intel Parallel Studio Suite compiler + MKL (version 2020.1)
Any assistance you can provide regarding/resolving these issues with the validation tests will be appreciated. Thank you in advance.
You do not have the required permissions to view the files attached to this post.
-
- Full Member
- Posts: 203
- Joined: Tue Oct 13, 2020 11:32 pm
Re: VASP 6.3.0 compiles but fais some validation tests
I think the culprit presumably related to the following setting in your makefile.include:
All the failed tests mentioned by you have passed on my machine (Ubuntu 20.04.3 LTS with dual Core Intel Xeon E5-2699 v4). See the following for more details on the toolchain, makefile.include, and the testsuite.log file.
1. The tool chains are the recent versions of Intel oneAPI base and hpc toolkits:
2. The content of the makefile.include is as follows:
Instead of using , the following option is used based on the suggestion here:
Side remark: Base on my testing, the following Intel MPI Library doesn't work,
mpi/2021.5.0, i.e., mpi/2021.5.1
Regards,
HZ
Code: Select all
FFLAGS += -xHOST
1. The tool chains are the recent versions of Intel oneAPI base and hpc toolkits:
Code: Select all
$ module purge
$ module load mpi/2021.4.0 mkl compiler
$ module list
Currently Loaded Modules:
1) mpi/2021.4.0 3) compiler-rt/2022.0.2 5) oclfpga/2022.0.2
2) tbb/2021.5.1 4) mkl/2022.0.2 6) compiler/2022.0.2
Code: Select all
$ egrep -v '^(#|$)' makefile.include.intel
CPP_OPTIONS = -DHOST=\"LinuxIFC\" \
-DMPI -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dfock_dblbuf
CPP = fpp -f_com=no -free -w0 $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)
FC = mpiifort
FCL = mpiifort
FREE = -free -names lowercase
FFLAGS = -assume byterecl -w
OFLAG = -O2
OFLAG_IN = $(OFLAG)
DEBUG = -O0
OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = icc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)
OBJECTS_LIB = linpack_double.o
CXX_PARS = icpc
LLIBS = -lstdc++
FFLAGS += -march=core-avx2
FFLAGS += -traceback -debug -g
LLIBS += -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl
FCL += -qmkl=parallel
INCS =-I$(MKLROOT)/include/fftw
Code: Select all
FFLAGS += -march=core-avx2
mpi/2021.5.0, i.e., mpi/2021.5.1
Regards,
HZ
You do not have the required permissions to view the files attached to this post.
-
- Newbie
- Posts: 3
- Joined: Thu Feb 10, 2022 12:33 am
Re: VASP 6.3.0 compiles but fais some validation tests
@hszhao.cn: Thank you. The +xHOST flag was indeed the issue. After replacing with the appropriate -march flag (are cluster is a bit too old to support AVX2:), the tests all pass. I am surprised that that is the cuplrit, I thought +xHOST just instructed the compiler to produce code to optimize/run on the processor being used for compilation, and I compiled on a system with the same processor as the test was run on, but the suggested modification worked. Thank you again for all your assistance.
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: VASP 6.3.0 compiles but fais some validation tests
Hszhao, thank you very much for helping us find the problem in your compilations.
-
- Full Member
- Posts: 203
- Joined: Tue Oct 13, 2020 11:32 pm
Re: VASP 6.3.0 compiles but fais some validation tests
Some tricks for setting the value of -march.
1. Obtain the arch name as follows:
Then based on the intel official document here, the following should be used:
2. If your arch/processor name is not listed in the intel official document here, just use the following trick as commented here:
I’ve confirmed that both of the above two settings can solve the problem discussed here.
Regards,
HZ
1. Obtain the arch name as follows:
Code: Select all
$ gcc -march=native -Q --help=target|grep -- '^[ ]*-march='
-march= broadwell
Code: Select all
FFLAGS += -march=broadwell
Code: Select all
FFLAGS += -march=native
Regards,
HZ
-
- Full Member
- Posts: 203
- Joined: Tue Oct 13, 2020 11:32 pm
Re: VASP 6.3.0 compiles but fais some validation tests
Using the following environment: Ubuntu 20.04.3 LTS installed on a dual Intel Xeon E5-2699 v4 CPUs machine, I recompiled vasp.6.3.0 using the -xHost option, and then validated all selected tests in the fast category successfully on the same machine. The following components of the Intel oneAPI BASE and HPC toolkits are used:
Attached are the related makefile.include and testsuite.log files. So, I conclude that if you compile and run vasp on the exactly same CPU architectures, -xHost should work, otherwise, use appropriate -march compiler option for cross-compilation. You can see related discussions here.
Regards,
HZ
Code: Select all
$ module load compiler mkl mpi/2021.4.0
$ module list
Currently Loaded Modules:
1) lmod 3) compiler-rt/2022.0.2 5) compiler/2022.0.2 7) mpi/2021.4.0
2) tbb/2021.5.1 4) oclfpga/2022.0.2 6) mkl/2022.0.2
Regards,
HZ
You do not have the required permissions to view the files attached to this post.