"received signal SIGSEGV: segmentation fault invalid memory error"

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Locked
Message
Author
junweilucasbao
Newbie
Newbie
Posts: 8
Joined: Wed Jan 13, 2021 3:46 pm

"received signal SIGSEGV: segmentation fault invalid memory error"

#1 Post by junweilucasbao » Mon Jul 03, 2023 10:12 pm

Dear Sir/Madam,

We have compiled vasp6.4.0 successfully (gnu_ompi_mkl_omp + hdf5 1.13). but when running the data, "The program received signal SIGSEGV: segmentation fault - invalid memory error" appears. Could you please take a look? greatly appreciated your help!

1. module used: openmpi/4.1.1-gcc.9.2 , gcc/11.2.0 , intel/2020 and hdf/1.13
2. makefile.include as below
[baoju@l001 vasp_gnu_ompi_mkl_omp_hdf5]$ vi makefile.include
FC_LIB = $(FC)
CC_LIB = gcc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS = g++
LLIBS = -lstdc++

##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##

# When compiling on the target machine itself, change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -march=native
FFLAGS += $(VASP_TARGET_CPU)

# For gcc-10 and higher (comment out for older versions)
FFLAGS += -fallow-argument-mismatch

# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
MKLROOT ?= /usr/public/intel/2020/compilers_and_libraries/linux/mkl
LLIBS_MKL = -L$(MKLROOT)/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 -lgomp -lpthread -lm -ldl
INCS = -I$(MKLROOT)/include/fftw

# Use a separate scaLAPACK installation (optional but recommended in combination with OpenMPI)
# Comment out the two lines below if you want to use scaLAPACK from MKL instead
#SCALAPACK_ROOT ?= /path/to/your/scalapack/installation
#LLIBS_MKL = -L$(SCALAPACK_ROOT)/lib -lscalapack -L$(MKLROOT)/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl

LLIBS += $(LLIBS_MKL)

# HDF5-support (optional but strongly recommended)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT ?= /mmfs1/public/hdf5/1.13.3gnu
LLIBS += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS += -I$(HDF5_ROOT)/include

# For the VASP-2-Wannier90 interface (optional)
#CPP_OPTIONS += -DVASP2WANNIER90
#WANNIER90_ROOT ?= /path/to/your/wannier90/installation
#LLIBS += -L$(WANNIER90_ROOT)/lib -lwannier

# For the fftlib library (hardly any benefit in combination with MKL's FFTs)
#CPP_OPTIONS+= -Dsysv
#FCL += fftlib.o
#CXX_FFTLIB = g++ -fopenmp -std=c++11 -DFFTLIB_USE_MKL -DFFTLIB_THREADSAFE
#INCS_FFTLIB = -I./include -I$(MKLROOT)/include/fftw
#LIBS += fftlib
#LLIBS += -ldl
------
3. dataset INCAR
-----
System = Li100_slab SCF
PREC = ACCURATE
ISTART = 0 #initial orbitals:0-from scratch;1-read in previous WAVECAR
ICHARG = 1 #initial charge density guess:2-atomic;1-read in previous CHGCAR;0-compute from WAVECAR
ENCUT = 600
ISMEAR = 1
SIGMA = 0.09

EDIFF = 1E-5
NELM = 200

ALGO = FAST
IBRION = 1
EDIFFG=-0.01
ISIF=2
NSW = 200
POTIM = 0.2

ISPIN = 2

GGA = PE
IVDW=12 #Dispersion: D3-BJ

IDIPOL = 3 #Dipole correction

WAVCAR=.FALSE.
CHGCAR=.FALSE.
NPAR = 3
---

4. error messages:
[baoju@l001 6.4]$slurm-1287912.out
| out finding an antiferromagnetic solution. Thence, we recommend |
| setting the initial magnetic moment manually or verifying carefully |
| that this magnetic setup is desired. |
| |
-----------------------------------------------------------------------------

scaLAPACK will be used
Reading from existing POTCAR
-----------------------------------------------------------------------------
| |
| ----> ADVICE to this user running VASP <---- |
| |
| You have a (more or less) 'large supercell' and for larger cells it |
| might be more efficient to use real-space projection operators. |
| Therefore, try LREAL= Auto in the INCAR file. |
| Mind: For very accurate calculation, you might also keep the |
| reciprocal projection scheme (i.e. LREAL=.FALSE.). |
| |
-----------------------------------------------------------------------------

LDA part: xc-table for Pade appr. of Perdew
POSCAR, INCAR and KPOINTS ok, starting setup
[c126:1524783] 71 more processes have sent help message help-mpi-btl-openib.txt / ib port not selected
[c126:1524783] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[c126:1524783] 71 more processes have sent help message help-mpi-btl-openib.txt / error in device init
FFT: planning ... GRIDC
FFT: planning ... GRID_SOFT
FFT: planning ... GRID
WAVECAR not read
WARNING: chargedensity file is incomplete

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

-------------------


5. We can run this program parallel. Then died very soon. Output file as below.
baoju@l001 6.4]$ more OUTCAR

atomic valenz-charges read in
non local Contribution for L= 0 read in
real space projection operators read in
non local Contribution for L= 0 read in
real space projection operators read in
non local Contribution for L= 1 read in
real space projection operators read in
PAW grid and wavefunctions read in

number of l-projection operators is LMAX = 3
number of lm-projection operators is LMMAX = 5

-----------------------------------------------------------------------------
| |
| ----> ADVICE to this user running VASP <---- |
| |
| You have a (more or less) 'large supercell' and for larger cells it |
| might be more efficient to use real-space projection operators. |
| Therefore, try LREAL= Auto in the INCAR file. |
| Mind: For very accurate calculation, you might also keep the |
| reciprocal projection scheme (i.e. LREAL=.FALSE.). |
| |
-----------------------------------------------------------------------------

PAW_PBE Li_sv 10Sep2004 :
energy of atom 1 EATOM= -202.7858
kinetic energy error for atom= 0.0100 (will be added to EATOM!!)


POSCAR: Libulkbcc\(1\0\0)
positions in direct lattice
velocities in cartesian coordinates
exchange correlation table for LEXCH = 8
RHO(1)= 0.500 N(1) = 2000
RHO(2)= 100.500 N(2) = 4000



--------------------------------------------------------------------------------------------------------


ion position nearest neighbor table
1 0.000 0.000 0.074-
2 0.000 0.000 0.223-
3 0.050 0.100 0.112-
4 0.000 0.000 1.000-
5 0.000 0.000 0.149-
6 0.050 0.100 0.038-
7 0.050 0.100 0.185-
8 0.100 0.000 0.074-
9 0.100 0.000 0.22

jonathan_lahnsteiner2
Global Moderator
Global Moderator
Posts: 216
Joined: Fri Jul 01, 2022 2:17 pm

Re: "received signal SIGSEGV: segmentation fault invalid memory error"

#2 Post by jonathan_lahnsteiner2 » Tue Jul 04, 2023 1:52 pm

Dear junweilucasbao,

I can't to tell the origin of a segmentation fault without any input files.
Please submit the input files for your job according to the vasp forum guidelines:
https://www.vasp.at/forum/viewtopic.php?f=4&t=17928
Then I will take a look what is going wrong with your job.

All the best

Jonathan

junweilucasbao
Newbie
Newbie
Posts: 8
Joined: Wed Jan 13, 2021 3:46 pm

Re: "received signal SIGSEGV: segmentation fault invalid memory error"

#3 Post by junweilucasbao » Wed Jul 05, 2023 2:54 pm

Hello

Here I attached the zip file containing the job files. Thanks!
Archive.zip
You do not have the required permissions to view the files attached to this post.

jonathan_lahnsteiner2
Global Moderator
Global Moderator
Posts: 216
Joined: Fri Jul 01, 2022 2:17 pm

Re: "received signal SIGSEGV: segmentation fault invalid memory error"

#4 Post by jonathan_lahnsteiner2 » Thu Jul 06, 2023 1:18 pm

Dear junweilucasbao,

In your slurm output file your are getting errors which are related to infibands on your cluster.

Code: Select all

WARNING: There was an error initializing an OpenFabrics device.
[c133:3903331] [[40908,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space in file ../../orte/util/show_help.c at line 513
In this case the best would be to talk to your system administrators to ask if he is aware of any problems. You should show them
this output:

Code: Select all

By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default.  The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.

  Local host:              c154
  Local adapter:           mlx5_0
  Local port:              1

--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   c154
  Local device: mlx5_0
--------------------------------------------------------------------------
[c133:3903331] [[40908,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space in file ../../orte/util/show_help.c at line 513
What you could still try yourself is to try running the job on a single core:

Code: Select all

mpirun -np 1  $VASP_EXEC/vasp_std
With this you could verify that your job is set up properly and see if it are the connections between nodes.

I am sorry that I can not be of more help.

All the best Jonathan

Locked