vasp6 gpu version ab initio MD crushes
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 4
- Joined: Fri Nov 15, 2019 7:10 pm
vasp6 gpu version ab initio MD crushes
Dear developers,
I'm running Langevin MD for 128 tungsten + 1 Re atoms on NERSC perlmutter
With the same input (INCAR, POSCAR, KPOINS, POTCAR) vasp.6.4.1 fail to reach self-consistency after 200 scf iterations and fails at second time step, while vasp 5 cpu version runs without any problem.
Below I copied e-mail received from nersk software engineer. I would appreciate any help. Please let me know what information do you need
"2023-12-15 13:14:48 PST - Phillip ThomasAdditional comments
Hi German,
Thank you for your patience! I tested your job with several versions of VASP:
5.4.4-cpu
6.3.2-cpu
6.4.1-cpu
6.2.1-gpu
6.3.2-gpu
6.4.1-gpu
6.4.2-gpu (not yet public on Perlmutter, new build)
I can reproduce the error that you experienced in 6.2.1-gpu, but I found that this error appears in *all* VASP-6 builds at NERSC; it is not specific to the GPU builds. Looking at the output files I noticed that the SCF iterations begin to differ between VASP 5.4.4 and the VASP 6.x.y runs very early in the calculation, with the SCF energies diverging within the first few SCF cycles (sometimes even in the very first step). In 5.4.4 the free energy always converges to a value around -1645 eV for all SCF cycles in the job, but all of the VASP 6.x.y builds show SCF divergence, so I believe the values from VASP 5.4.4 to be correct.
I notice that the number of "eigenvalue-minimisations" in VASP 6.X begins to differ from VASP 5.4.4 at the point of divergence, so I suspect the issue lies in the eigensolver routine.
At this point I recommend that you file a bug report with the VASP developers. Some issues that the VASP developers might check include:
1) Were there any changes in the eigensolver routine between VASP 5 and VASP 6 which may have introduced a bug?
2) Were any default parameters changed between VASP 5 and VASP 6 which might affect SCF convergence for certain types of systems? If so, then you may be able to restore convergence by setting some parameter in your INCAR in the VASP 6.x.y runs.
3) Is there a possibility of a bug either in the compiler or in the linked libraries which may affect the VASP 6.x.y versions but not VASP 5.4.4? All versions of VASP at NERSC were built using NVIDIA SDK 22.7 and use Cray-MPICH, if that helps.
If you decide to file a bug report with VASP, we would be grateful if you reference the thread in this ticket so that we can track it and patch our VASP builds if the developers suggest a patch!
Best,
Phillip
"
the thread in this ticket is Ref:MSG3501497
I'm running Langevin MD for 128 tungsten + 1 Re atoms on NERSC perlmutter
With the same input (INCAR, POSCAR, KPOINS, POTCAR) vasp.6.4.1 fail to reach self-consistency after 200 scf iterations and fails at second time step, while vasp 5 cpu version runs without any problem.
Below I copied e-mail received from nersk software engineer. I would appreciate any help. Please let me know what information do you need
"2023-12-15 13:14:48 PST - Phillip ThomasAdditional comments
Hi German,
Thank you for your patience! I tested your job with several versions of VASP:
5.4.4-cpu
6.3.2-cpu
6.4.1-cpu
6.2.1-gpu
6.3.2-gpu
6.4.1-gpu
6.4.2-gpu (not yet public on Perlmutter, new build)
I can reproduce the error that you experienced in 6.2.1-gpu, but I found that this error appears in *all* VASP-6 builds at NERSC; it is not specific to the GPU builds. Looking at the output files I noticed that the SCF iterations begin to differ between VASP 5.4.4 and the VASP 6.x.y runs very early in the calculation, with the SCF energies diverging within the first few SCF cycles (sometimes even in the very first step). In 5.4.4 the free energy always converges to a value around -1645 eV for all SCF cycles in the job, but all of the VASP 6.x.y builds show SCF divergence, so I believe the values from VASP 5.4.4 to be correct.
I notice that the number of "eigenvalue-minimisations" in VASP 6.X begins to differ from VASP 5.4.4 at the point of divergence, so I suspect the issue lies in the eigensolver routine.
At this point I recommend that you file a bug report with the VASP developers. Some issues that the VASP developers might check include:
1) Were there any changes in the eigensolver routine between VASP 5 and VASP 6 which may have introduced a bug?
2) Were any default parameters changed between VASP 5 and VASP 6 which might affect SCF convergence for certain types of systems? If so, then you may be able to restore convergence by setting some parameter in your INCAR in the VASP 6.x.y runs.
3) Is there a possibility of a bug either in the compiler or in the linked libraries which may affect the VASP 6.x.y versions but not VASP 5.4.4? All versions of VASP at NERSC were built using NVIDIA SDK 22.7 and use Cray-MPICH, if that helps.
If you decide to file a bug report with VASP, we would be grateful if you reference the thread in this ticket so that we can track it and patch our VASP builds if the developers suggest a patch!
Best,
Phillip
"
the thread in this ticket is Ref:MSG3501497
-
- Global Moderator
- Posts: 161
- Joined: Thu Nov 03, 2022 1:03 pm
Re: vasp6 gpu version ab initio MD crushes
Dear german_d.samolyuk1,
We will need some more information about the jobs in question to check the performance of VASP 5.4.4 and the later 6.x.y versions. Could you provide us with the input files that you or the Nersk engineer are using?
Kind regards,
Pedro Melo
We will need some more information about the jobs in question to check the performance of VASP 5.4.4 and the later 6.x.y versions. Could you provide us with the input files that you or the Nersk engineer are using?
Kind regards,
Pedro Melo
-
- Newbie
- Posts: 4
- Joined: Fri Nov 15, 2019 7:10 pm
Re: vasp6 gpu version ab initio MD crushes
Dear Pedro Melo,
Thank you for your quick replay.
I attached archive wre.tar. It contains INCAR, POSCAR, POTCAR, KPOINTS, gpu.pbatch (the one i used tu run vasp6), cpu.pbatch (vasp5).
Sincerely,
German
Thank you for your quick replay.
I attached archive wre.tar. It contains INCAR, POSCAR, POTCAR, KPOINTS, gpu.pbatch (the one i used tu run vasp6), cpu.pbatch (vasp5).
Sincerely,
German
-
- Global Moderator
- Posts: 161
- Joined: Thu Nov 03, 2022 1:03 pm
Re: vasp6 gpu version ab initio MD crushes
Dear German,
You seem to have forgotten the .tar file.
Best,
Pedro
You seem to have forgotten the .tar file.
Best,
Pedro
-
- Newbie
- Posts: 4
- Joined: Fri Nov 15, 2019 7:10 pm
Re: vasp6 gpu version ab initio MD crushes
Dear Pedro,
Did it work this time?
Thanks,
German
Did it work this time?
Thanks,
German
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 161
- Joined: Thu Nov 03, 2022 1:03 pm
Re: vasp6 gpu version ab initio MD crushes
Dear German,
In your INCAR there are at least 3 references to the algorithm that you want VASP to use:
ALGO = Fast
IALGO = 48
ALGO = VeryFast
If I am not wrong, VASP will only consider the first time ALGO is assigned. Could you try changing the INCAR and use a more robust option for ALGO, such as Normal?
Kind regards,
Pedro
In your INCAR there are at least 3 references to the algorithm that you want VASP to use:
ALGO = Fast
IALGO = 48
ALGO = VeryFast
If I am not wrong, VASP will only consider the first time ALGO is assigned. Could you try changing the INCAR and use a more robust option for ALGO, such as Normal?
Kind regards,
Pedro
-
- Newbie
- Posts: 4
- Joined: Fri Nov 15, 2019 7:10 pm
Re: vasp6 gpu version ab initio MD crushes
Dear Pedro,
Now it works:)
Surprisingly, IALGO=48 has been read from INCAR and it didn't work with vasp6, but worked with vasp5
Thank you,
German
Happy Holidays!
Now it works:)
Surprisingly, IALGO=48 has been read from INCAR and it didn't work with vasp6, but worked with vasp5
Thank you,
German
Happy Holidays!