Failure of cRPA example with parallel calculation

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
zhishuo_huang
Newbie
Newbie
Posts: 5
Joined: Fri May 10, 2024 11:50 am

Failure of cRPA example with parallel calculation

#1 Post by zhishuo_huang » Thu Jun 06, 2024 2:19 pm

Dear developers and users,

I am trying to do cRPA calculations with vasp 6.4.3 with wannier90 3.1.0, compiled with intel compiler 2022.
I first ran the calculation following the example (https://www.vasp.at/wiki/index.php/CRPA_of_SrVO3).
However, the last step of cRPA calculation for a set of automatically chosen imaginary frequency points failed with parallel calculation, while the serial run can finish successfully.
The error message is (a full information is shown in the PBS output file in the attachment):
[colo-chmlu-01:308605:0:308605] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308609:0:308609] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308581:0:308581] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308582:0:308582] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308583:0:308583] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308584:0:308584] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308585:0:308585] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308587:0:308587] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308588:0:308588] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308593:0:308593] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308594:0:308594] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308595:0:308595] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308596:0:308596] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308597:0:308597] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308598:0:308598] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308599:0:308599] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308600:0:308600] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308601:0:308601] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308602:0:308602] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308603:0:308603] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308604:0:308604] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308606:0:308606] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308607:0:308607] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308608:0:308608] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308610:0:308610] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308611:0:308611] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308612:0:308612] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308586:0:308586] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308589:0:308589] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308590:0:308590] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308591:0:308591] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[colo-chmlu-01:308592:0:308592] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 308586) ====
0 0x0000000000538a41 scala_mp_check_gdes_mat_size_() ???:0
1 0x0000000001bd88b2 chi_super_mp_calculate_xi_real_() ???:0
2 0x0000000001e5b9cc vamp_IP_do_rpa_() main.f90:0
3 0x0000000001e3173d MAIN__() ???:0
4 0x0000000000408512 main() ???:0
5 0x0000000000022555 __libc_start_main() ???:0
6 0x0000000000408429 _start() ???:0
=================================
==== backtrace (tid: 308591) ====
0 0x0000000000538a41 scala_mp_check_gdes_mat_size_() ???:0
1 0x0000000001bd88b2 chi_super_mp_calculate_xi_real_() ???:0
2 0x0000000001e5b9cc vamp_IP_do_rpa_() main.f90:0
3 0x0000000001e3173d MAIN__() ???:0
4 0x0000000000408512 main() ???:0
5 0x0000000000022555 __libc_start_main() ???:0
6 0x0000000000408429 _start() ???:0
=================================
...
...
Image PC Routine Line Source
vasp_std 0000000001FDB35A Unknown Unknown Unknown
libpthread-2.17.s 00002B39B1152630 Unknown Unknown Unknown
vasp_std 0000000000538A41 Unknown Unknown Unknown
vasp_std 0000000001BD88B2 Unknown Unknown Unknown
vasp_std 0000000001E5B9CC Unknown Unknown Unknown
vasp_std 0000000001E3173D Unknown Unknown Unknown
vasp_std 0000000000408512 Unknown Unknown Unknown
libc-2.17.so 00002B39B1683555 __libc_start_main Unknown Unknown
vasp_std 0000000000408429 Unknown Unknown Unknown
==== backtrace (tid: 308581) ====
0 0x0000000000538a41 scala_mp_check_gdes_mat_size_() ???:0
1 0x0000000001bd88b2 chi_super_mp_calculate_xi_real_() ???:0
2 0x0000000001e5b9cc vamp_IP_do_rpa_() main.f90:0
3 0x0000000001e3173d MAIN__() ???:0
4 0x0000000000408512 main() ???:0
5 0x0000000000022555 __libc_start_main() ???:0
6 0x0000000000408429 _start() ???:0
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_std 0000000001FDB35A Unknown Unknown Unknown
libpthread-2.17.s 00002B16F59C2630 Unknown Unknown Unknown
vasp_std 0000000000538A41 Unknown Unknown Unknown
vasp_std 0000000001BD88B2 Unknown Unknown Unknown
vasp_std 0000000001E5B9CC Unknown Unknown Unknown
vasp_std 0000000001E3173D Unknown Unknown Unknown
vasp_std 0000000000408512 Unknown Unknown Unknown
libc-2.17.so 00002B16F5EF3555 __libc_start_main Unknown Unknown
vasp_std 0000000000408429 Unknown Unknown Unknown
======================================================================================

I attach the relevant files:
INCAR.CRPA_wan: the modified INCAR for cRPA at omega=0 with wannier orbital,
cRPA_Wan_parallel.o6135280: pbs output file,
makefile.include: make file for the compilation,
pbs_vasp6.4.3_intel_testcRPA: PBS script file.

I appreciate your time and any suggestion or explanation.

Best regards
Zhishuo Huang
You do not have the required permissions to view the files attached to this post.

merzuk.kaltak
Administrator
Administrator
Posts: 295
Joined: Mon Sep 24, 2018 9:39 am

Re: Failure of cRPA example with parallel calculation

#2 Post by merzuk.kaltak » Tue Jun 11, 2024 2:07 pm

Dear Zhishuo Huang,

Thank you for submitting an error report.
There is indeed a bug in the code that is triggered when you use a large number of MPI ranks for such a small job.
The fix will be released in version 6.5.0.
For the time being I suggest you run this job with a smaller number of MPI ranks, e.g. 4 should suffice.

Note, I have updated the tutorial page, suggesting using the WANNIER90_WIN as of version 6.2.0 (including newer versions).

Moreover, a fresh CRPA tutorial will be published soon that works in conjunction with py4vasp.

zhishuo_huang
Newbie
Newbie
Posts: 5
Joined: Fri May 10, 2024 11:50 am

Re: Failure of cRPA example with parallel calculation

#3 Post by zhishuo_huang » Fri Jun 14, 2024 9:20 am

Dear Merzuk Kaltak,

Thank you for your information.

Best regards
Zhishuo Huang

Post Reply