Mining an old OUTCAR for ML_AB data?

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
victor_robinson
Newbie
Newbie
Posts: 10
Joined: Mon Sep 26, 2022 11:44 pm

Mining an old OUTCAR for ML_AB data?

#1 Post by victor_robinson » Fri Mar 17, 2023 10:18 pm

Dear all,

This is not clear to me still: Can we use previous AIMD runs (OUTCAR's from MD without ML) to then convert the data into ML_AB or 'rerun' vasp over this OUTCAR to create a ML_AB or even ML_FF?

From what I have read I don't think VASP can do this, but perhaps there is a script that mines data to create a ML_AB?

Thanks and regards, Victor

andreas.singraber
Global Moderator
Global Moderator
Posts: 255
Joined: Mon Apr 26, 2021 7:40 am

Re: Mining an old OUTCAR for ML_AB data?

#2 Post by andreas.singraber » Wed Mar 22, 2023 12:55 pm

Dear Victor,

unfortunately we do not yet provide a tool to convert a single or series of OUTCAR files to an ML_AB training structure database. This is on our agenda but I cannot give you a time horizon for delivery. However, it should be possible to write such a converter yourself without much effort. Take an existing ML_AB file as reference, e.g., from the VASP testsuite you could have a look at testsuite/tests/ML_LiF_CaO_ISTART1/ML_ABN.ref which contains mixed-type structures with up to four elements. Here are a few important steps to consider in order to create a valid ML_AB file:
  1. The ML_AB file starts with a header providing general information, e.g. about the types and maximum number of atoms,... Either extract this information from the OUTCAR file (search for VRHFIN, ions per type, etc.) or set up this part manually.
  2. Afterwards the section starting with

    Code: Select all

    The numbers of basis sets per atom type
    usually contains the local reference configurations for each type which were selected during on-the-fly training. Because we cannot know from the data in the OUTCAR file which atoms should go there, you need to add a dummy section only listing a single atom, e.g. like this:

    Code: Select all

    ...
    **************************************************
         The numbers of basis sets per atom type
    --------------------------------------------------
            1    1    1
            1
    **************************************************
         Basis set for Li
    --------------------------------------------------
              1      1
    **************************************************
         Basis set for F
    --------------------------------------------------
              1     1
    **************************************************
         Basis set for Ca
    --------------------------------------------------
              1      1
    **************************************************
         Basis set for O
    --------------------------------------------------
              1     1
              ....
    
  3. Then follows the list of all configurations, always starting with

    Code: Select all

    **************************************************
         Configuration num.      ???
    ==================================================
    
    You can get the lattice, position, energy, force and stress data from the OUTCAR file if you look for these lines:

    Code: Select all

    direct lattice vectors                                                     ---> lattice
    POSITION                                       TOTAL-FORCE (eV/Angst)      ---> positions and forces
    free  energy   TOTEN  =                                                    ---> energy
    in kB                                                                      ---> stress
    
  4. In some ML_AB files each configuration contains a section like this:

    Code: Select all

    ==================================================                                                       
         CTIFOR
    --------------------------------------------------
       1.0000000000000001E-016
    ==================================================
    
    You can safely omit this section, it is not required for this purpose.
Finally, after you created an ML_AB file from your OUTCAR data, you need to perform a special mode of training where local reference configurations are selected. This can be done by setting ML_MODE=select (equal to ML_ISTART=3, NSW=1) in your INCAR file.

Hope this helps you if you attempt to write a script yourself.

All the best,
Andreas Singraber

victor_robinson
Newbie
Newbie
Posts: 10
Joined: Mon Sep 26, 2022 11:44 pm

Re: Mining an old OUTCAR for ML_AB data?

#3 Post by victor_robinson » Fri Mar 24, 2023 9:16 pm

Thanks for the informative response. I agree, it would be good to be able to loop over old OUTCAR data once that is available. I may give this a go until then.
Victor

jianxiang_lian
Newbie
Newbie
Posts: 1
Joined: Wed Feb 15, 2023 8:52 pm

Re: Mining an old OUTCAR for ML_AB data?

#4 Post by jianxiang_lian » Thu Sep 12, 2024 1:35 pm

Hello all,

I resume this conversation because my problem is completely related.
I am using VASP to perform AIMD simulations. I have a collection of AIMD trajectories and I want to mine them in order to train a force field (MLFF).

I followed the instructions given in the previous discussion (post from Andreas Singraber).
I created a python script to gather the required information from OUTCAR files (atomic species, number of atoms, positions, energy, forces, stress, etc.), and create a valid ML_AB file (with and without CTIFOR section). I compared my 'homemade' ML_AB file with the one from an actual MLFF simulation, and they look identical (if we discard the atomic basis sets).

After creating the ML_AB file from my OUTCAR data, I performed a MLFF calculation "from scratch", by setting ML_MODE=select in my INCAR file, and providing the generated ML_AB file.
However, it seems that the calculation only considers the very first ionic step, but not the whole trajectory. As well, the total energy in the new OUTCAR file is zero.
I cannot verify the validity of the generated ML_FFN file. But the size looks different when I compare it with the ML_FFN file generated from the actual MLFF simulation.

I am not sure what other parameter must be set in order to take into consideration the data of the whole AIMD simulation.

Here are my INCAR parameters for the MLFF training.
#Basic parameters
ISMEAR = 0
SIGMA = 0.1
LREAL = Auto
ISYM = -1
NELM = 100
EDIFF = 1E-4
LWAVE = .FALSE.
LCHARG = .FALSE.

#Parallelization of ab initio calculations
NCORE = 8

#MD
IBRION = 0
MDALGO = 2
ISIF = 2
SMASS = 1.0
TEBEG = 300
NSW = 100
POTIM = 3.0
RANDOM_SEED = 88951986 0 0

#Machine learning parameters
ML_LMLFF = .TRUE.
ML_ISTART = 3
ML_MODE = select

If you need more information on my simulation (generated ML_AB file, etc.), please, feel free to ask!
Your guidance and help will be highly appreciated!

Best regards,
JX Lian


Post Reply