I am running some AIMD simulation with ML force field on vasp.6.3. I easily encounter the error message "!!! MLFF : Not enough storage for local reference configurations, please increase se ML_MB !!!"
Would that be okay to increase the ML_MB value? It appears to me that more training data will usually be good for ML. But I also want to make sure there is no upper limit that is not suggested by the developer
Design matrix for ML_FF of vasp.6.3
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 1
- Joined: Wed Feb 19, 2020 9:41 pm
-
- Hero Member
- Posts: 593
- Joined: Tue Nov 16, 2004 2:21 pm
- License Nr.: 5-67
- Location: Germany
Re: Design matrix for ML_FF of vasp.6.3
Hello Gerbrand,
a similar issue was discussed here recently: forum/viewtopic.php?f=4&t=18475#p21820
Hth,
alex
a similar issue was discussed here recently: forum/viewtopic.php?f=4&t=18475#p21820
Hth,
alex
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Design matrix for ML_FF of vasp.6.3
It's not exactly the same problem.
In the other post the user ran out of memory immediately because scaLAPACK was not employed. The code is practically unusable without scaLAPACK for realistic systems, since each processor needs to have the entire design matrix which is a huge object. With scaLAPACK the design matrix is linearly scaling with the number of processors.
In our current case Gerbrand doesn't run out of memory, but the maximum number of local reference configurations (ML_MB) is reached. The default is ML_MB=1500. This number is usually enough for simple to medium difficult systems, but for complex systems or training data from different conditions (e.g. Si in it's different phases) this is easily not enough. So in that case just simply increase this number.
ML_MB sets the column dimension of the design matrx for each atom type.
The row dimension is ML_MCONF. ML_MCONF conatins the whole training structures (this is exportable to other ML methods), ML_MB conatins the local reference configurations for specific atoms (this is specific to Kernel ridge regression). So the size of the design matrix that will be allocated is ML_MB*ML_CONF*Number_of_atom_types. Again with scalapack this array will be then shared by all processors, so the more processors one uses the smaller this array will get per processor.
The beginning of the ML_LOGFILE (wiki/index.php/ML_LOGFILE) contains information on the estimated memory.
Most of the arrays like the design matrix are statically allocated at the beginning of the code. Why? Because we use shared memory MPI. At the point when we implemented shared memory using SystemV we saw that reallocations of shared memory segments lead to total irregular crashes. Shared memory is used for many important arrays, so we ended up using static memory allocations.
In the other post the user ran out of memory immediately because scaLAPACK was not employed. The code is practically unusable without scaLAPACK for realistic systems, since each processor needs to have the entire design matrix which is a huge object. With scaLAPACK the design matrix is linearly scaling with the number of processors.
In our current case Gerbrand doesn't run out of memory, but the maximum number of local reference configurations (ML_MB) is reached. The default is ML_MB=1500. This number is usually enough for simple to medium difficult systems, but for complex systems or training data from different conditions (e.g. Si in it's different phases) this is easily not enough. So in that case just simply increase this number.
ML_MB sets the column dimension of the design matrx for each atom type.
The row dimension is ML_MCONF. ML_MCONF conatins the whole training structures (this is exportable to other ML methods), ML_MB conatins the local reference configurations for specific atoms (this is specific to Kernel ridge regression). So the size of the design matrix that will be allocated is ML_MB*ML_CONF*Number_of_atom_types. Again with scalapack this array will be then shared by all processors, so the more processors one uses the smaller this array will get per processor.
The beginning of the ML_LOGFILE (wiki/index.php/ML_LOGFILE) contains information on the estimated memory.
Most of the arrays like the design matrix are statically allocated at the beginning of the code. Why? Because we use shared memory MPI. At the point when we implemented shared memory using SystemV we saw that reallocations of shared memory segments lead to total irregular crashes. Shared memory is used for many important arrays, so we ended up using static memory allocations.