Combining MLFF Output

Message

mdh1992 · #1 Post by **mdh1992** » Fri Dec 13, 2024 1:48 pm

Dear VASP Community,

I am interested in using VASP to obtain ML force fields for a particular system. This is a functionality I haven't used before, so I'm keen to ensure I fully understand the process before using our computational resources to tackle our system. Whilst the information provided on the VASP Wiki has been very useful, there is one point that remains unclear to me.

The VASP Wiki states that it is often wise to train FFs for separate components of a system, such as a surface and an adsorbate, before training for the combined system, in order to minimise computational cost, and achieve optimal fitting. I understand that the key information is written to the ML_AB file which provides the data from previous MLFF runs, which can then be read at the start of a new MLFF run with additional components.

My question is - how do I combine multiple ML_AB files from previous calculations such that VASP will be able to read it correctly? It's clear to me that if, for example, I have a system consisting of adsorbate A and surface B, I can run VASP to generate ML_AB for component A, and then use that as an input file for the adsorbate-substrate complex (i.e. (A+B)), and the FF obtained for the components of A will be used to initialise the run for (A+B). However, I could also run VASP for component B alone first, obtain the ML_AB for that system, and then combine them. It seems the most expedient way to obtain good force fields would be to take the ML_AB file from both isolated components and combine them so that the (A+B) system has the most information available at initialisation, but it is unclear to me how one practically does this. Can ML_AB files simply be concatenated, like e.g. pseudopotentials in a POTCAR file, and VASP will just read in the data from a series of ML_AB files if they are combined in this way? Or does a particular format need to be adhered to? If the latter, are there scripts available that expedite this task? Any advice on this would be most appreciated.

An unrelated question on VASP ML force fields - I think others have raised this point for different codes, but is it at all possible to convert VASP ML force field output such that it could be used with the GULP code? I appreciate this may not be possible yet, given that ML potential fitting doesn't necessarily rely on the kind of potential models used by GULP (https://gulp.curtin.edu.au/models.html), but it would be interesting if this were a possibility in the future.

One final point: when one is satisfied with the ML force fields, the ML_MODE=run tag allows one to perform MD runs using the obtained potentials and without any ab initio component. Is it possible to to perform simple optimisation of a given structure based on the obtained ML force fields, analogous to geometry optimisation of structures within, say, GULP, in order to locate the nearest local minimum to the starting structure? I suppose it is probably possible to set up a MD run that will achieve this, but it might be more computationally demanding than simply running a geometry optimisation via e.g. structural phase space, whereas I suspect a MD run will rapidly converge on a global minimum structure and will sample the global minimum and nearby local minima at far higher frequency than perhaps comparable local minima that are further from the global minimum.

Thanks in advance for any comments or suggestions, your responses are greatly appreciated.

#2 Post by **ferenc_karsai** » Mon Dec 16, 2024 10:54 am

VASP is designed to train consecutively on the structures. That means train structure A the use the obtained ML_AB file to continue training on structure B etc.
If you want to train on separate structures each you have to combine the ML_AB files manually. The training structures can be simply concatenated but the information at the beginning of the file has to be adjusted to the new set of training structures (number of training structures, element types, max number of atoms per structure etc.). The local reference configurations per structure should be set to 1 for each structure as a placeholder. After that the ML_MODE=select has to be run on the new ML_AB file. This selects the local reference configurations on the combined data. This step can be quite time consuming.
After both ML_MODE=refit has to be run to enable the fast force field for the production runs.

Usually porting to a different engine for ionic movement is very time consuming and GULP is possibly not used widely enough to make the effort. From this webpage (https://atomistic.software/) you can see that GULP has approximately 250 yearly citations (with a trend of being constant) compared to 4700 of LAMMPS (with increasing tendency each year). We have made now a LAMMPS interface which is still in testing phase. If we make another interface it will be for another widely used code.

When you have a force field it should run with all options for ionic movements (IBRION). It was not tested for all, but geometry optimization was definitely tested and works. However, mind that you will be not as fast as the pure force field, since for each structure VASP has to be started separately and the force field needs some time to initialize.

mdh1992 · #3 Post by **mdh1992** » Mon Dec 16, 2024 12:37 pm

This is all really helpful, thanks very much for this!

I'll experiment with both consecutive force field fitting and combining ML_AB files as you suggest. Perhaps there is scope to write a script that will automate this process. I suppose the most expedient way to generate ML force fields consecutively for an adsorbate on a surface would be to run ab initio MD on the smallest surface cell for the facet we are interested in to obtain force fields for the species that constitute the surface, then multiply the surface to generate a supercell of dimensions commensurate with accommodating the adsorbate, initialise the adsorbate-substrate system, copy the ML_AB file from the previous surface-only run, and then restart ab initio MD to generate ML force fields to represent the intradsorbate interactions (although if we were able to do this non-consecutively, i.e. obtain force fields for the isolated adsorbate first, this would perhaps give us a headstart but would conversely perhaps be tedious to combine the ML_AB files, as you say) and adsorbate-substrate interactions. After this we should have ML force fields for all interactions we are interested in so from there it would be a matter of refining and refitting to prepare the fast force fields for productions runs as you have detailed.

I appreciate that LAMMPS is far more popular than GULP for classical MD studies so it is fair and reasonable that efforts be directed towards porting VASP ML force fields to LAMMPS ahead of other less popular codes.

Thanks for the information regarding the availability of simple geometry optimisation with ML force fields in VASP, I'll play around with this, I guess I should be able to try all of the normal VASP functionality with the force fields as long as ML_MODE=run is present in the INCAR and there is a ML_FF file for VASP to read.

#4 Post by **ferenc_karsai** » Mon Dec 16, 2024 1:01 pm

I would omit running training on too small cells. You gain simply less information (missing phonons etc.) or learn interactions to neighboring boxes that are an artifact of having too small cells (interactions with itself).
Our force-field is by design always short ranged since the interactions of an atom are always within a cut-off radius. Of course if you run on a larger cell you're potentials in DFT are "longer" ranged. So you would learn within a cut-off radius DFT effects larger than the cut-off radius.

For surfaces specifically, I would learn on surface with different number of layers or alternatively surface with a given number of layers plus bulk.

My Community

Combining MLFF Output

Combining MLFF Output

Re: Combining MLFF Output

Re: Combining MLFF Output

Re: Combining MLFF Output