Long MLFF learning times before SCF loops

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
burakgurlek
Jr. Member
Jr. Member
Posts: 51
Joined: Thu Apr 06, 2023 12:25 pm

Long MLFF learning times before SCF loops

#1 Post by burakgurlek » Tue Oct 17, 2023 5:21 pm

Dear all,

I am using MLFF to learn a PES. I realizd that for the learning step VASP excutes MLFF run before SCF loop starts, and this can take 1000 longer than regular MLFF steps in the process. This happens with very accurate learning settings such as
ML_WTIFOR = 20
ML_CX = -0.1
ML_IALGO_LINREG=1
ML_SION1=0.3
ML_MRB2=12

The MLFF timings just before the SCF loop:
ML FREE ENERGIE OF THE ION-ELECTRON SYSTEM (eV)
---------------------------------------------------
free energy ML TOTEN = -1908.20671935 eV

ML energy without entropy= -1908.20671935 ML energy(sigma->0) = -1908.20671935

MLFF: cpu time 0.0794: real time 0.0807
WAVPRE: cpu time 0.1541: real time 0.1542
FEWALD: cpu time 0.0070: real time 0.0069
ORTHCH: cpu time 4.3136: real time 4.3167
LOOP+: cpu time 579.2458: real time 579.9671

A regular accurate interval is
ML FREE ENERGIE OF THE ION-ELECTRON SYSTEM (eV)
---------------------------------------------------
free energy ML TOTEN = -1928.89655573 eV

ML energy without entropy= -1928.89655573 ML energy(sigma->0) = -1928.89655573

MLFF: cpu time 0.0981: real time 0.0995
LOOP+: cpu time 0.1069: real time 0.1094
RANDOM_SEED = 262985622 1728 0
IONSTEP: cpu time 0.0087: real time 0.0087

I experience that with less number of atoms and less accurate settings this timing is reduced a lot. I think it is mostly related to I/O as MLFF times are quite short. Would you have an idea why this step takes so much time and can be alleviated?

The files can be accessed from: https://www.dropbox.com/scl/fo/ubdrs42b ... zjycq&dl=0

Regards,
Burak

ferenc_karsai
Global Moderator
Global Moderator
Posts: 473
Joined: Mon Nov 04, 2019 12:44 pm

Re: Long MLFF learning times before SCF loops

#2 Post by ferenc_karsai » Thu Oct 19, 2023 2:12 pm

For further analysis and to reproduce the calculation there are some files missing.
Please send the following: OUTCAR, ML_AB, OSZICAR, ML_LOGFILE.

burakgurlek
Jr. Member
Jr. Member
Posts: 51
Joined: Thu Apr 06, 2023 12:25 pm

Re: Long MLFF learning times before SCF loops

#3 Post by burakgurlek » Fri Oct 20, 2023 12:33 pm

Dear Ferenc,

please find the file

https://www.dropbox.com/scl/fo/jk3v9n9g ... zu0om&dl=0

I think the large LOOP+ times are for MLFF run step after SCF loop and shows the additionl the time spent during SCF loop in the previous step. I kind of verified this, but good to know your opinion to be sure.

Regards,
Burak

andreas.singraber
Global Moderator
Global Moderator
Posts: 249
Joined: Mon Apr 26, 2021 7:40 am

Re: Long MLFF learning times before SCF loops

#4 Post by andreas.singraber » Tue Oct 24, 2023 2:27 pm

Dear Burak,

I had a closer look at this and there seems to be a minor bug in the OUTCAR output. As you suggested the timing from the ab initio calculation is incorrectly added to the LOOP+ value of the next ionic iteration, therefore resulting in a confusing output. For example, consider a ML run with lots of prediction steps and a single ab initio calculation in between. The ML_LOGFILE may look like this:

Code: Select all

...
--------------------------------------------------------------------------------
STATUS                 30 interval   7      F      T         8         8
BEEF                   30   1.14869189E-06   7.97017403E-03   3.17484779E-03   5.68933390E-03   2.01358232E-01   1.48497335E-01
--------------------------------------------------------------------------------
STATUS                 31 interval   7      F      T         9         9
BEEF                   31   1.14999887E-06   8.16317631E-03   3.20769044E-03   5.68933390E-03   1.98323879E-01   1.46492904E-01
--------------------------------------------------------------------------------
STATUS                 32 threshold  2      T      T         0        10
BEEF                   32   1.15492983E-06   8.37160870E-03   3.25291923E-03   5.68933390E-03   1.96233164E-01   1.44612855E-01
--------------------------------------------------------------------------------
STATUS                 33 interval   7      F      T         1        11
BEEF                   33   1.15713436E-06   8.62825755E-03   3.26689721E-03   5.68933390E-03   1.91540138E-01   1.41744708E-01
--------------------------------------------------------------------------------
STATUS                 34 interval   7      F      T         2        12
BEEF                   34   1.12730343E-06   8.76582926E-03   3.18219016E-03   5.68933390E-03   1.83203490E-01   1.34458294E-01
--------------------------------------------------------------------------------
...
Then, in the OUTCAR file we can find the following order of LOOP+ timing statements:

Code: Select all

...
--------------------------------------- Ionic step       30  -------------------------------------------
... MLFF prediction ...

      MLFF:  cpu time      0.1060: real time      0.1061
     LOOP+:  cpu time      0.1066: real time      0.1067
...
--------------------------------------- Ionic step       31  -------------------------------------------
... MLFF prediction ...

      MLFF:  cpu time      0.1067: real time      0.1069
     LOOP+:  cpu time      0.1074: real time      0.1075
...
--------------------------------------- Ionic step       32  -------------------------------------------
... MLFF prediction ...

      MLFF:  cpu time      0.1175: real time      0.1176
    WAVPRE:  cpu time      0.0013: real time      0.0013     \
    FEWALD:  cpu time      0.0030: real time      0.0030      |
    ORTHCH:  cpu time      0.0278: real time      0.0278      | THIS SHOULD COME AFTER THE AB INITIO CALCULATION!
     LOOP+:  cpu time      0.1502: real time      0.1503     /

... Ab initio calculation

...
--------------------------------------- Ionic step       33  -------------------------------------------
... MLFF prediction ...

      MLFF:  cpu time      0.1050: real time      0.1051
     LOOP+:  cpu time      3.6094: real time      3.6122     | TIMING OF SCF CYCLE IS INCORRECTLY ADDED HERE!
...
--------------------------------------- Ionic step       34  -------------------------------------------
As you can see from my remarks the LOOP+ timer is incorrectly stopped and written out after the ML prediction in step 32. It seems it is then immediately reset and adds the time of the ab initio calculation. Since there is no further LOOP+ output in step 32 we find the timing of the ab initio calculation only afterwards in step 33.

Thank you for making us aware of this bug, we will fix this until the next release!

All the best,
Andreas Singraber

Post Reply