VASP- GPU fails to converge
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 28
- Joined: Mon Feb 01, 2021 1:53 pm
VASP- GPU fails to converge
We are using VASP- GPU for hybrid calculations and we are getting error as follows
Device Memory Info:
Total: 16276.2 MB
Free: 1.2 MB
Used: 16275.0 MB
Requested: 1.9 MB
CUDA Error in cuda_mem.cu, line 179: out of memory
Failed to allocate device memory!
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 55048 RUNNING AT scanmatdgx1
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
[/color]
But we have enough space in the device.
How to proceed further?
Device Memory Info:
Total: 16276.2 MB
Free: 1.2 MB
Used: 16275.0 MB
Requested: 1.9 MB
CUDA Error in cuda_mem.cu, line 179: out of memory
Failed to allocate device memory!
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 55048 RUNNING AT scanmatdgx1
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
[/color]
But we have enough space in the device.
How to proceed further?
-
- Global Moderator
- Posts: 543
- Joined: Fri Nov 08, 2019 7:18 am
Re: VASP- GPU fails to converge
Can you provide a bit more information about how you run these calculations? Did you try smaller systems successfully and are now running this larger calculation that fails, or do you get the same error for any case you use?
In the former case, how do you know that you have enough memory? What specifically did you compare against what?
In the latter case, could you provide the input files for the calculations you run?
Either way, can you also tell me which version of VASP you are using and whether you use the deprecated CUDA port or the OpenACC version?
In the former case, how do you know that you have enough memory? What specifically did you compare against what?
In the latter case, could you provide the input files for the calculations you run?
Either way, can you also tell me which version of VASP you are using and whether you use the deprecated CUDA port or the OpenACC version?
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 28
- Joined: Mon Feb 01, 2021 1:53 pm
Re: VASP- GPU fails to converge
Yes. for smaller systems, it ran successfully. for supercells only, it is failing.
I am using VASP 5.4.1
and my input files are as follows
INCAR
System = z
!Star Parameters for this run:
ISTART = 1 !0 Start job: 1 restart constant energy cut-off 2 restart constant basis set
PREC = Accurate
LWAVE= .TRUE.
LREAL = TRUE !
!!Electronic relaxation :
EDIFF = 1E-6 ! accuracy required 1E-6
NELMIN = 5 !no of ELM steps !
LORBIT = 11
!!Ionic relaxation:
ENCUT = 400
ISMEAR = 0
SIGMA = 0.01
EDIFFG= -0.01
#GGA = PE
LHFCALC = .TRUE.
HFSCREEN = 0.2
PRECREEN = Fast
AEXX = 0.25
ALGO = All
LVDW= TRUE
IVDW = 1
NBANDS= 100
script
#!/bin/bash
#SBATCH --job-name=12.5Sbnd
#SBATCH --output=slurm-%j.out
#SBATCH --error=slurm-%j.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=14
#SBATCH --distribution=cyclic:cyclic
#SBATCH --time=420:00:00
#SBATCH --mem-per-cpu=8000
##SBATCH --mail-type=END,FAIL
##SBATCH --mail-user=email@ufl.edu
#SBATCH --partition=debug
#SBATCH --gres=gpu:1
date;hostname;pwd
ulimit -s unlimited
ulimit -l unlimited
ulimit -m unlimited
pwd; hostname; date |tee result
# Setting some variables
module load vasp
module load CUDA/9.0
#for i in 15 16; do
# echo "n = $i"
WORK=$SLURM_SUBMIT_DIR
echo $WORK
# making scratch directory
SCRATCH=/home/${USER}/example/${SLURM_JOBID}
echo ${SCRATCH}
mkdir -p $SCRATCH/test
RUN=$SCRATCH/test
# Goto run dir
cd $RUN
# Copy inpufiles to common scratch
cp $WORK/INCAR_bnd $RUN/INCAR
cp $WORK/CONTCAR-opt $RUN/POSCAR
cp $WORK/POTCAR $RUN
cp $WORK/IBZKPT-bnd $RUN/KPOINTS
#cp $WORK/WAVECAR $RUN/WAVECAR
#cp $WORK/CHGCAR $RUN/CHGCAR
ls -ltr
mpirun vasp_gpu
#mpirun vasp_std
cp OUTCAR $WORK/OUTCAR-band
cp CONTCAR $WORK/CONTCAR-band
cp DOSCAR $WORK/DOSCAR
cp PROCAR $WORK/PROCAR
cp EIGENVAL $WORK/EIGENVAL
cp vasprun.xml $WORK/vasprunband.xml
cd $WORK
rm -rf $RUN
#done
I am using VASP 5.4.1
and my input files are as follows
INCAR
System = z
!Star Parameters for this run:
ISTART = 1 !0 Start job: 1 restart constant energy cut-off 2 restart constant basis set
PREC = Accurate
LWAVE= .TRUE.
LREAL = TRUE !
!!Electronic relaxation :
EDIFF = 1E-6 ! accuracy required 1E-6
NELMIN = 5 !no of ELM steps !
LORBIT = 11
!!Ionic relaxation:
ENCUT = 400
ISMEAR = 0
SIGMA = 0.01
EDIFFG= -0.01
#GGA = PE
LHFCALC = .TRUE.
HFSCREEN = 0.2
PRECREEN = Fast
AEXX = 0.25
ALGO = All
LVDW= TRUE
IVDW = 1
NBANDS= 100
script
#!/bin/bash
#SBATCH --job-name=12.5Sbnd
#SBATCH --output=slurm-%j.out
#SBATCH --error=slurm-%j.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=14
#SBATCH --distribution=cyclic:cyclic
#SBATCH --time=420:00:00
#SBATCH --mem-per-cpu=8000
##SBATCH --mail-type=END,FAIL
##SBATCH --mail-user=email@ufl.edu
#SBATCH --partition=debug
#SBATCH --gres=gpu:1
date;hostname;pwd
ulimit -s unlimited
ulimit -l unlimited
ulimit -m unlimited
pwd; hostname; date |tee result
# Setting some variables
module load vasp
module load CUDA/9.0
#for i in 15 16; do
# echo "n = $i"
WORK=$SLURM_SUBMIT_DIR
echo $WORK
# making scratch directory
SCRATCH=/home/${USER}/example/${SLURM_JOBID}
echo ${SCRATCH}
mkdir -p $SCRATCH/test
RUN=$SCRATCH/test
# Goto run dir
cd $RUN
# Copy inpufiles to common scratch
cp $WORK/INCAR_bnd $RUN/INCAR
cp $WORK/CONTCAR-opt $RUN/POSCAR
cp $WORK/POTCAR $RUN
cp $WORK/IBZKPT-bnd $RUN/KPOINTS
#cp $WORK/WAVECAR $RUN/WAVECAR
#cp $WORK/CHGCAR $RUN/CHGCAR
ls -ltr
mpirun vasp_gpu
#mpirun vasp_std
cp OUTCAR $WORK/OUTCAR-band
cp CONTCAR $WORK/CONTCAR-band
cp DOSCAR $WORK/DOSCAR
cp PROCAR $WORK/PROCAR
cp EIGENVAL $WORK/EIGENVAL
cp vasprun.xml $WORK/vasprunband.xml
cd $WORK
rm -rf $RUN
#done
-
- Global Moderator
- Posts: 543
- Joined: Fri Nov 08, 2019 7:18 am
Re: VASP- GPU fails to converge
I'm still not sure how you judge that you have enough memory. It seems that you would like to do band structure calculations, in this case you can reduce the memory demand by splitting the calculation into multiple subparts or by using less points per line.
Unfortunately, I cannot provide more specific advice for your system, because the old Cuda port is not maintained anymore. If you can reproduce this behavior with the OpenACC version, we would need to look into it more carefully.
Unfortunately, I cannot provide more specific advice for your system, because the old Cuda port is not maintained anymore. If you can reproduce this behavior with the OpenACC version, we would need to look into it more carefully.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 28
- Joined: Mon Feb 01, 2021 1:53 pm
Re: VASP- GPU fails to converge
The memory I am talking about is the memory possessed by the deice- I mean the supercomputer in which we are running the calculation.
Am I wrong in assuming memory?
Or is there any other measure I have to consider?
Am I wrong in assuming memory?
Or is there any other measure I have to consider?
-
- Global Moderator
- Posts: 543
- Joined: Fri Nov 08, 2019 7:18 am
Re: VASP- GPU fails to converge
Well there are two parts to the comparison, the memory available on the device and the memory that VASP needs to perform the calculation.
In particular for band structure calculations, the memory requirement can be quite a bit larger than for the self-consistency calculation, because the number of k-points is often larger.
Then again, I don't know how efficient the hybrid functional in the old Cuda port was. This part was worked on a lot in the OpenACC port to enhance the performance on one or more GPUs.
In particular for band structure calculations, the memory requirement can be quite a bit larger than for the self-consistency calculation, because the number of k-points is often larger.
Then again, I don't know how efficient the hybrid functional in the old Cuda port was. This part was worked on a lot in the OpenACC port to enhance the performance on one or more GPUs.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 28
- Joined: Mon Feb 01, 2021 1:53 pm
Re: VASP- GPU fails to converge
Is there any way I can modify the memory requirement for vasp to perform the calculation?
Anything I have to do with the script.. ?
The error comes like this,
Device Memory Info:
Total: 16276.2 MB
Free: 1.2 MB
Used: 16275.0 MB
Requested: 1.9 MB
Anything I have to do with the script.. ?
The error comes like this,
Device Memory Info:
Total: 16276.2 MB
Free: 1.2 MB
Used: 16275.0 MB
Requested: 1.9 MB
-
- Global Moderator
- Posts: 543
- Joined: Fri Nov 08, 2019 7:18 am
Re: VASP- GPU fails to converge
Smaller energy cutoffs, less k-points, prec = normalscanmat_centre wrote: ↑Tue Jun 28, 2022 4:53 am Is there any way I can modify the memory requirement for vasp to perform the calculation?
Anything I have to do with the script.. ?
Of course you need to test whether this affects your results.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 28
- Joined: Mon Feb 01, 2021 1:53 pm
Re: VASP- GPU fails to converge
I will check with them.
-
- Newbie
- Posts: 28
- Joined: Mon Feb 01, 2021 1:53 pm
Re: VASP- GPU fails to converge
Thank you, its working now.
I have reduced ENCUT, and changed Precision to Normal from Accurate.
I have reduced ENCUT, and changed Precision to Normal from Accurate.