Segmentation fault when running VASP with ML_LMLFF = .TRUE.
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 5
- Joined: Mon Feb 28, 2022 9:07 am
Segmentation fault when running VASP with ML_LMLFF = .TRUE.
Dear vasp developers,
I want to run MD simulation with switched on MLFF training but I got segmentation fault error.
I tried to set ulimit -s unlimited but it didn't help.
All files related to the simulation are attached.
Thank you
I want to run MD simulation with switched on MLFF training but I got segmentation fault error.
I tried to set ulimit -s unlimited but it didn't help.
All files related to the simulation are attached.
Thank you
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.
It looks like your scalapack has problems setting up the processor grid. I suspect some problems with scalapack.
Please try compiling without "-DscaLAPACK" and run the code. Most likely you will run out of memory, so run it for something small just for test purposes (let's say 8-16 atoms). If that runs properly please try to switch back scalapack and if then the error comes again we have pinned it down to a faulty scalapack. Also do your tests with a lesser number of cores. Please also test running on one node and on more.
Which toolchains are you using?
Please try compiling without "-DscaLAPACK" and run the code. Most likely you will run out of memory, so run it for something small just for test purposes (let's say 8-16 atoms). If that runs properly please try to switch back scalapack and if then the error comes again we have pinned it down to a faulty scalapack. Also do your tests with a lesser number of cores. Please also test running on one node and on more.
Which toolchains are you using?
-
- Newbie
- Posts: 5
- Joined: Mon Feb 28, 2022 9:07 am
Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.
Thank you for your reply,
it seems that scalapack is one part of the problem. I ran a simulation with small cell (8 atoms) using VASP without "-DscaLAPACK" and it was working but it didn't work using VASP with "-DscaLAPACK". Therefore, I ran the simulation with large (144 atoms) supercell without scalapack but there is some other problem. File with error output is attached.
I am using Intel-2021.4.0 and OpenMPI-4.1.2.
Thank you
it seems that scalapack is one part of the problem. I ran a simulation with small cell (8 atoms) using VASP without "-DscaLAPACK" and it was working but it didn't work using VASP with "-DscaLAPACK". Therefore, I ran the simulation with large (144 atoms) supercell without scalapack but there is some other problem. File with error output is attached.
I am using Intel-2021.4.0 and OpenMPI-4.1.2.
Thank you
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.
The large calculation most likely went out of memory. In the ML_LOGFILE the memory prediction can be seen per core. In your case it writes:
"Total memory consumption : 16056.6". I guess you don't have 16GB per core available.
Practically the use of scalapack is required to run the code for realistic systems (at least the learning part), because the design matrix needs to be distributed. Without scalapack each core possesses the whole design matrix. With scalapack the distribution of this array is almost perfectly linear with the number of cores. We made the code available to use without scalapack to pin down scalapack errors like in your case.
So this means you need to fix your scalapack installation to be able to run the ML code on your system.
"Total memory consumption : 16056.6". I guess you don't have 16GB per core available.
Practically the use of scalapack is required to run the code for realistic systems (at least the learning part), because the design matrix needs to be distributed. Without scalapack each core possesses the whole design matrix. With scalapack the distribution of this array is almost perfectly linear with the number of cores. We made the code available to use without scalapack to pin down scalapack errors like in your case.
So this means you need to fix your scalapack installation to be able to run the ML code on your system.
-
- Newbie
- Posts: 5
- Joined: Mon Feb 28, 2022 9:07 am
Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.
Thank you for your help. I have already solved the problem with scalapack and it works very well now. I had only a trivial mistake in makefile related to linking libraries.
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.
Thank you for your reply, I am very glad that it works now.
I am going to close this topic now.
I am going to close this topic now.