Definition of Error in MLFF
Moderators: Global Moderator, Moderator
-
- Jr. Member
- Posts: 51
- Joined: Thu Apr 06, 2023 12:25 pm
Definition of Error in MLFF
Hi,
according to VASP manual the training error in VASP is defined as in the attached image. I wonder what is meant by element-wise over each training structure. Assuming the molecule is composed of Cs and Hs only, the error for the each structure calculated for Cs and Hs separately with the refenrence data and then avaraged to get a number?
Best Regards,
Burak
according to VASP manual the training error in VASP is defined as in the attached image. I wonder what is meant by element-wise over each training structure. Assuming the molecule is composed of Cs and Hs only, the error for the each structure calculated for Cs and Hs separately with the refenrence data and then avaraged to get a number?
Best Regards,
Burak
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Definition of Error in MLFF
The equation you wrote is correct.
The information for the energies is correct.
For the forces N goes over atoms and cartesian directions, instead of elements times the number of atoms per element and cartesian directions.
For the stresses N goes over 9 cartesian directons. No elements enter.
For both forces and stresses N goes of course also over the training structures.
Can you please point me to the place in the manual where this is written so I can correct it.
The information for the energies is correct.
For the forces N goes over atoms and cartesian directions, instead of elements times the number of atoms per element and cartesian directions.
For the stresses N goes over 9 cartesian directons. No elements enter.
For both forces and stresses N goes of course also over the training structures.
Can you please point me to the place in the manual where this is written so I can correct it.
-
- Jr. Member
- Posts: 51
- Joined: Thu Apr 06, 2023 12:25 pm
Re: Definition of Error in MLFF
Thanks,
This is written here wiki/index.php/Best_practices_for_machi ... rce_fields under the section Monitoring.
One more additinoal question: I created a test configurations and calculate the RMSE error on the forces using the above formula. However, the error written in ML_LOG file via ERR is larger than this. Would you expect that or I am comparing different definitions?
This is written here wiki/index.php/Best_practices_for_machi ... rce_fields under the section Monitoring.
One more additinoal question: I created a test configurations and calculate the RMSE error on the forces using the above formula. However, the error written in ML_LOG file via ERR is larger than this. Would you expect that or I am comparing different definitions?
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Definition of Error in MLFF
The formula is correct, it's easy to check in the code in the SUMMARY_REPORT subroutine.
Be careful with the source of the forces, since the units on the ML_AB file are in eV/Angstrom. These converted into atomic units in the code. But if you take the forces and energies from the ML_REG file they should fit together and are again in eV/Angstrom.
Just be careful to divide the forces by number of training structures times number of atoms in each training structure times three. If the number of atoms within your training structures change than you have to consider that.
Be careful with the source of the forces, since the units on the ML_AB file are in eV/Angstrom. These converted into atomic units in the code. But if you take the forces and energies from the ML_REG file they should fit together and are again in eV/Angstrom.
Just be careful to divide the forces by number of training structures times number of atoms in each training structure times three. If the number of atoms within your training structures change than you have to consider that.
-
- Jr. Member
- Posts: 51
- Joined: Thu Apr 06, 2023 12:25 pm
Re: Definition of Error in MLFF
Thanks, I currently do not have acceess to code as it is ithe cluster.
In my comparison, I got forces via py4vasp as vaspout.hd5 file. I thought the unit of force there is eV/A. I do not use ML_AB or ML_REG file. I took ERR line in ML_LOG file as a reference which gave larger error than the one I calculate via single-point MD calculation via using ML_FF and py4vasp. I make a simple comparison for Naphtalene crystal so the atomic number is constant.
Do you think the source I am using has a problem?
Regards,
Burak
In my comparison, I got forces via py4vasp as vaspout.hd5 file. I thought the unit of force there is eV/A. I do not use ML_AB or ML_REG file. I took ERR line in ML_LOG file as a reference which gave larger error than the one I calculate via single-point MD calculation via using ML_FF and py4vasp. I make a simple comparison for Naphtalene crystal so the atomic number is constant.
Do you think the source I am using has a problem?
Regards,
Burak
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Definition of Error in MLFF
Just a quick question for clarification, you did do single point calculations for all training structures and not the last? The "ERR" line give the error on all training structures in the ML_AB file, but if you do it only on the last structure it will be of course smaller.
-
- Jr. Member
- Posts: 51
- Joined: Thu Apr 06, 2023 12:25 pm
Re: Definition of Error in MLFF
I did not do for all training structures, but I have an independent test set and I did for them. So, it is not one to one correspondence, but if you think about the regular ML schema, I do not see why error on training set is larger than an independent test set.
Regards,
Burak
Regards,
Burak
-
- Global Moderator
- Posts: 473
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Definition of Error in MLFF
Ok, thanks for clarification. I just wanted to rule out that you tried to calculate the error on a subset of training structures, because then it would be of course smaller.
So I've checked that the forces on hdf5 file should be in eV/Ang, so no issue there.
A smaller test error compared to training error is most likely due to a too small test set. Imagine you trained a given structure at 300 and 1000 K. If you trained at 300K first and 1000 K you would see that the RMSE on forces of the training structures would significantly go up after adding the high temperature structures, since the higher the temperature is the more variance you get in your system. If you would now pick out a few structures as test structures at 300 K your test error is going to be close to the error when you trained only on 300K.
So you should try to include more structures in your test set, spanning the trained phase space. You should also see that if you include test structures above the trained conditions (like at 1200 K for the upper example) your test error should strongly exceed your training error.
So I've checked that the forces on hdf5 file should be in eV/Ang, so no issue there.
A smaller test error compared to training error is most likely due to a too small test set. Imagine you trained a given structure at 300 and 1000 K. If you trained at 300K first and 1000 K you would see that the RMSE on forces of the training structures would significantly go up after adding the high temperature structures, since the higher the temperature is the more variance you get in your system. If you would now pick out a few structures as test structures at 300 K your test error is going to be close to the error when you trained only on 300K.
So you should try to include more structures in your test set, spanning the trained phase space. You should also see that if you include test structures above the trained conditions (like at 1200 K for the upper example) your test error should strongly exceed your training error.
-
- Jr. Member
- Posts: 51
- Joined: Thu Apr 06, 2023 12:25 pm
Re: Definition of Error in MLFF
Thanks for the response. You have a point but in my case is not that drastically effecting I believe. I trained on 295 K and my test set is from 280, 290, 300, 310. I equally select the number of structures from those. However, the error between test set and training is more like 10meV. I would give a try just to do test set with 295K.
Regards,
Burak
Regards,
Burak