Dear Sir/Madam,
We have compiled vasp6.4.0 successfully (gnu_ompi_mkl_omp + hdf5 1.13). but when running the data, "The program received signal SIGSEGV: segmentation fault - invalid memory error" appears. Could you please take a look? greatly appreciated your help!
1. module used: openmpi/4.1.1-gcc.9.2 , gcc/11.2.0 , intel/2020 and hdf/1.13
2. makefile.include as below
[baoju@l001 vasp_gnu_ompi_mkl_omp_hdf5]$ vi makefile.include
FC_LIB = $(FC)
CC_LIB = gcc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)
OBJECTS_LIB = linpack_double.o
# For the parser library
CXX_PARS = g++
LLIBS = -lstdc++
##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself, change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -march=native
FFLAGS += $(VASP_TARGET_CPU)
# For gcc-10 and higher (comment out for older versions)
FFLAGS += -fallow-argument-mismatch
# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
MKLROOT ?= /usr/public/intel/2020/compilers_and_libraries/linux/mkl
LLIBS_MKL = -L$(MKLROOT)/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 -lgomp -lpthread -lm -ldl
INCS = -I$(MKLROOT)/include/fftw
# Use a separate scaLAPACK installation (optional but recommended in combination with OpenMPI)
# Comment out the two lines below if you want to use scaLAPACK from MKL instead
#SCALAPACK_ROOT ?= /path/to/your/scalapack/installation
#LLIBS_MKL = -L$(SCALAPACK_ROOT)/lib -lscalapack -L$(MKLROOT)/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
LLIBS += $(LLIBS_MKL)
# HDF5-support (optional but strongly recommended)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT ?= /mmfs1/public/hdf5/1.13.3gnu
LLIBS += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS += -I$(HDF5_ROOT)/include
# For the VASP-2-Wannier90 interface (optional)
#CPP_OPTIONS += -DVASP2WANNIER90
#WANNIER90_ROOT ?= /path/to/your/wannier90/installation
#LLIBS += -L$(WANNIER90_ROOT)/lib -lwannier
# For the fftlib library (hardly any benefit in combination with MKL's FFTs)
#CPP_OPTIONS+= -Dsysv
#FCL += fftlib.o
#CXX_FFTLIB = g++ -fopenmp -std=c++11 -DFFTLIB_USE_MKL -DFFTLIB_THREADSAFE
#INCS_FFTLIB = -I./include -I$(MKLROOT)/include/fftw
#LIBS += fftlib
#LLIBS += -ldl
------
3. dataset INCAR
-----
System = Li100_slab SCF
PREC = ACCURATE
ISTART = 0 #initial orbitals:0-from scratch;1-read in previous WAVECAR
ICHARG = 1 #initial charge density guess:2-atomic;1-read in previous CHGCAR;0-compute from WAVECAR
ENCUT = 600
ISMEAR = 1
SIGMA = 0.09
EDIFF = 1E-5
NELM = 200
ALGO = FAST
IBRION = 1
EDIFFG=-0.01
ISIF=2
NSW = 200
POTIM = 0.2
ISPIN = 2
GGA = PE
IVDW=12 #Dispersion: D3-BJ
IDIPOL = 3 #Dipole correction
WAVCAR=.FALSE.
CHGCAR=.FALSE.
NPAR = 3
---
4. error messages:
[baoju@l001 6.4]$slurm-1287912.out
| out finding an antiferromagnetic solution. Thence, we recommend |
| setting the initial magnetic moment manually or verifying carefully |
| that this magnetic setup is desired. |
| |
-----------------------------------------------------------------------------
scaLAPACK will be used
Reading from existing POTCAR
-----------------------------------------------------------------------------
| |
| ----> ADVICE to this user running VASP <---- |
| |
| You have a (more or less) 'large supercell' and for larger cells it |
| might be more efficient to use real-space projection operators. |
| Therefore, try LREAL= Auto in the INCAR file. |
| Mind: For very accurate calculation, you might also keep the |
| reciprocal projection scheme (i.e. LREAL=.FALSE.). |
| |
-----------------------------------------------------------------------------
LDA part: xc-table for Pade appr. of Perdew
POSCAR, INCAR and KPOINTS ok, starting setup
[c126:1524783] 71 more processes have sent help message help-mpi-btl-openib.txt / ib port not selected
[c126:1524783] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[c126:1524783] 71 more processes have sent help message help-mpi-btl-openib.txt / error in device init
FFT: planning ... GRIDC
FFT: planning ... GRID_SOFT
FFT: planning ... GRID
WAVECAR not read
WARNING: chargedensity file is incomplete
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
-------------------
5. We can run this program parallel. Then died very soon. Output file as below.
baoju@l001 6.4]$ more OUTCAR
atomic valenz-charges read in
non local Contribution for L= 0 read in
real space projection operators read in
non local Contribution for L= 0 read in
real space projection operators read in
non local Contribution for L= 1 read in
real space projection operators read in
PAW grid and wavefunctions read in
number of l-projection operators is LMAX = 3
number of lm-projection operators is LMMAX = 5
-----------------------------------------------------------------------------
| |
| ----> ADVICE to this user running VASP <---- |
| |
| You have a (more or less) 'large supercell' and for larger cells it |
| might be more efficient to use real-space projection operators. |
| Therefore, try LREAL= Auto in the INCAR file. |
| Mind: For very accurate calculation, you might also keep the |
| reciprocal projection scheme (i.e. LREAL=.FALSE.). |
| |
-----------------------------------------------------------------------------
PAW_PBE Li_sv 10Sep2004 :
energy of atom 1 EATOM= -202.7858
kinetic energy error for atom= 0.0100 (will be added to EATOM!!)
POSCAR: Libulkbcc\(1\0\0)
positions in direct lattice
velocities in cartesian coordinates
exchange correlation table for LEXCH = 8
RHO(1)= 0.500 N(1) = 2000
RHO(2)= 100.500 N(2) = 4000
--------------------------------------------------------------------------------------------------------
ion position nearest neighbor table
1 0.000 0.000 0.074-
2 0.000 0.000 0.223-
3 0.050 0.100 0.112-
4 0.000 0.000 1.000-
5 0.000 0.000 0.149-
6 0.050 0.100 0.038-
7 0.050 0.100 0.185-
8 0.100 0.000 0.074-
9 0.100 0.000 0.22
"received signal SIGSEGV: segmentation fault invalid memory error"
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 8
- Joined: Wed Jan 13, 2021 3:46 pm
-
- Global Moderator
- Posts: 216
- Joined: Fri Jul 01, 2022 2:17 pm
Re: "received signal SIGSEGV: segmentation fault invalid memory error"
Dear junweilucasbao,
I can't to tell the origin of a segmentation fault without any input files.
Please submit the input files for your job according to the vasp forum guidelines:
https://www.vasp.at/forum/viewtopic.php?f=4&t=17928
Then I will take a look what is going wrong with your job.
All the best
Jonathan
I can't to tell the origin of a segmentation fault without any input files.
Please submit the input files for your job according to the vasp forum guidelines:
https://www.vasp.at/forum/viewtopic.php?f=4&t=17928
Then I will take a look what is going wrong with your job.
All the best
Jonathan
-
- Newbie
- Posts: 8
- Joined: Wed Jan 13, 2021 3:46 pm
Re: "received signal SIGSEGV: segmentation fault invalid memory error"
Hello
Here I attached the zip file containing the job files. Thanks!
Here I attached the zip file containing the job files. Thanks!
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 216
- Joined: Fri Jul 01, 2022 2:17 pm
Re: "received signal SIGSEGV: segmentation fault invalid memory error"
Dear junweilucasbao,
In your slurm output file your are getting errors which are related to infibands on your cluster.
In this case the best would be to talk to your system administrators to ask if he is aware of any problems. You should show them
this output:
What you could still try yourself is to try running the job on a single core:
With this you could verify that your job is set up properly and see if it are the connections between nodes.
I am sorry that I can not be of more help.
All the best Jonathan
In your slurm output file your are getting errors which are related to infibands on your cluster.
Code: Select all
WARNING: There was an error initializing an OpenFabrics device.
[c133:3903331] [[40908,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space in file ../../orte/util/show_help.c at line 513
this output:
Code: Select all
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default. The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.
Local host: c154
Local adapter: mlx5_0
Local port: 1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: c154
Local device: mlx5_0
--------------------------------------------------------------------------
[c133:3903331] [[40908,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space in file ../../orte/util/show_help.c at line 513
Code: Select all
mpirun -np 1 $VASP_EXEC/vasp_std
I am sorry that I can not be of more help.
All the best Jonathan