VASP- GPU fails to converge

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Locked
Message
Author
scanmat_centre
Newbie
Newbie
Posts: 28
Joined: Mon Feb 01, 2021 1:53 pm

VASP- GPU fails to converge

#1 Post by scanmat_centre » Thu Jun 16, 2022 5:25 am

We are using VASP- GPU for hybrid calculations and we are getting error as follows

Device Memory Info:
Total: 16276.2 MB
Free: 1.2 MB
Used: 16275.0 MB
Requested: 1.9 MB

CUDA Error in cuda_mem.cu, line 179: out of memory
Failed to allocate device memory!

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 55048 RUNNING AT scanmatdgx1
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

[/color]


But we have enough space in the device.
How to proceed further?

martin.schlipf
Global Moderator
Global Moderator
Posts: 543
Joined: Fri Nov 08, 2019 7:18 am

Re: VASP- GPU fails to converge

#2 Post by martin.schlipf » Fri Jun 17, 2022 6:56 am

Can you provide a bit more information about how you run these calculations? Did you try smaller systems successfully and are now running this larger calculation that fails, or do you get the same error for any case you use?
In the former case, how do you know that you have enough memory? What specifically did you compare against what?
In the latter case, could you provide the input files for the calculations you run?
Either way, can you also tell me which version of VASP you are using and whether you use the deprecated CUDA port or the OpenACC version?

Martin Schlipf
VASP developer


scanmat_centre
Newbie
Newbie
Posts: 28
Joined: Mon Feb 01, 2021 1:53 pm

Re: VASP- GPU fails to converge

#3 Post by scanmat_centre » Sat Jun 25, 2022 6:23 am

Yes. for smaller systems, it ran successfully. for supercells only, it is failing.
I am using VASP 5.4.1
and my input files are as follows


INCAR

System = z

!Star Parameters for this run:

ISTART = 1 !0 Start job: 1 restart constant energy cut-off 2 restart constant basis set
PREC = Accurate
LWAVE= .TRUE.
LREAL = TRUE !
!!Electronic relaxation :
EDIFF = 1E-6 ! accuracy required 1E-6
NELMIN = 5 !no of ELM steps !
LORBIT = 11
!!Ionic relaxation:
ENCUT = 400
ISMEAR = 0
SIGMA = 0.01
EDIFFG= -0.01
#GGA = PE
LHFCALC = .TRUE.
HFSCREEN = 0.2
PRECREEN = Fast
AEXX = 0.25
ALGO = All
LVDW= TRUE
IVDW = 1
NBANDS= 100




script


#!/bin/bash
#SBATCH --job-name=12.5Sbnd
#SBATCH --output=slurm-%j.out
#SBATCH --error=slurm-%j.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=14
#SBATCH --distribution=cyclic:cyclic
#SBATCH --time=420:00:00
#SBATCH --mem-per-cpu=8000
##SBATCH --mail-type=END,FAIL
##SBATCH --mail-user=email@ufl.edu
#SBATCH --partition=debug
#SBATCH --gres=gpu:1
date;hostname;pwd


ulimit -s unlimited
ulimit -l unlimited
ulimit -m unlimited

pwd; hostname; date |tee result
# Setting some variables

module load vasp
module load CUDA/9.0

#for i in 15 16; do
# echo "n = $i"

WORK=$SLURM_SUBMIT_DIR

echo $WORK

# making scratch directory
SCRATCH=/home/${USER}/example/${SLURM_JOBID}
echo ${SCRATCH}
mkdir -p $SCRATCH/test
RUN=$SCRATCH/test

# Goto run dir
cd $RUN

# Copy inpufiles to common scratch
cp $WORK/INCAR_bnd $RUN/INCAR
cp $WORK/CONTCAR-opt $RUN/POSCAR
cp $WORK/POTCAR $RUN
cp $WORK/IBZKPT-bnd $RUN/KPOINTS
#cp $WORK/WAVECAR $RUN/WAVECAR
#cp $WORK/CHGCAR $RUN/CHGCAR
ls -ltr

mpirun vasp_gpu
#mpirun vasp_std

cp OUTCAR $WORK/OUTCAR-band
cp CONTCAR $WORK/CONTCAR-band
cp DOSCAR $WORK/DOSCAR
cp PROCAR $WORK/PROCAR
cp EIGENVAL $WORK/EIGENVAL
cp vasprun.xml $WORK/vasprunband.xml

cd $WORK
rm -rf $RUN

#done

martin.schlipf
Global Moderator
Global Moderator
Posts: 543
Joined: Fri Nov 08, 2019 7:18 am

Re: VASP- GPU fails to converge

#4 Post by martin.schlipf » Mon Jun 27, 2022 9:36 am

I'm still not sure how you judge that you have enough memory. It seems that you would like to do band structure calculations, in this case you can reduce the memory demand by splitting the calculation into multiple subparts or by using less points per line.
Unfortunately, I cannot provide more specific advice for your system, because the old Cuda port is not maintained anymore. If you can reproduce this behavior with the OpenACC version, we would need to look into it more carefully.

Martin Schlipf
VASP developer


scanmat_centre
Newbie
Newbie
Posts: 28
Joined: Mon Feb 01, 2021 1:53 pm

Re: VASP- GPU fails to converge

#5 Post by scanmat_centre » Mon Jun 27, 2022 2:07 pm

The memory I am talking about is the memory possessed by the deice- I mean the supercomputer in which we are running the calculation.
Am I wrong in assuming memory?
Or is there any other measure I have to consider?

martin.schlipf
Global Moderator
Global Moderator
Posts: 543
Joined: Fri Nov 08, 2019 7:18 am

Re: VASP- GPU fails to converge

#6 Post by martin.schlipf » Mon Jun 27, 2022 3:33 pm

Well there are two parts to the comparison, the memory available on the device and the memory that VASP needs to perform the calculation.
In particular for band structure calculations, the memory requirement can be quite a bit larger than for the self-consistency calculation, because the number of k-points is often larger.

Then again, I don't know how efficient the hybrid functional in the old Cuda port was. This part was worked on a lot in the OpenACC port to enhance the performance on one or more GPUs.

Martin Schlipf
VASP developer


scanmat_centre
Newbie
Newbie
Posts: 28
Joined: Mon Feb 01, 2021 1:53 pm

Re: VASP- GPU fails to converge

#7 Post by scanmat_centre » Tue Jun 28, 2022 4:53 am

Is there any way I can modify the memory requirement for vasp to perform the calculation?
Anything I have to do with the script.. ?

The error comes like this,

Device Memory Info:
Total: 16276.2 MB
Free: 1.2 MB
Used: 16275.0 MB
Requested: 1.9 MB

martin.schlipf
Global Moderator
Global Moderator
Posts: 543
Joined: Fri Nov 08, 2019 7:18 am

Re: VASP- GPU fails to converge

#8 Post by martin.schlipf » Tue Jun 28, 2022 6:26 am

scanmat_centre wrote: Tue Jun 28, 2022 4:53 am Is there any way I can modify the memory requirement for vasp to perform the calculation?
Anything I have to do with the script.. ?
Smaller energy cutoffs, less k-points, prec = normal

Of course you need to test whether this affects your results.

Martin Schlipf
VASP developer


scanmat_centre
Newbie
Newbie
Posts: 28
Joined: Mon Feb 01, 2021 1:53 pm

Re: VASP- GPU fails to converge

#9 Post by scanmat_centre » Wed Jun 29, 2022 5:10 am

I will check with them.

scanmat_centre
Newbie
Newbie
Posts: 28
Joined: Mon Feb 01, 2021 1:53 pm

Re: VASP- GPU fails to converge

#10 Post by scanmat_centre » Wed Jul 06, 2022 8:13 am

Thank you, its working now.
I have reduced ENCUT, and changed Precision to Normal from Accurate.

Locked