Hello VASP Community,
I am currently exploring the GPU-accelerated version of VASP and have a couple of questions regarding its performance and configuration:
1. NVLink Connectivity: Does establishing NVLink connections between multiple GPUs affect the overall efficiency of calculations? If so, how significant is this impact for larger simulations?
2. Job Scaling Across Multiple GPUs: What are the efficiency differences between running multiple small system jobs on several GPUs vs. a single large system job across the same number of GPUs? Does the nature of the system size influence the computational efficiency or resource utilization in a notable way?
Any insights or experiences regarding these aspects would be greatly appreciated.
Thank you!
Zhao
Impact of NVLink and Multi-GPU Scaling on Efficiency for Various Job Sizes in VASP GPU Edition.
Moderators: Global Moderator, Moderator
-
- Full Member
- Posts: 197
- Joined: Tue Oct 13, 2020 11:32 pm
-
- Global Moderator
- Posts: 110
- Joined: Tue Oct 17, 2023 10:17 am
Re: Impact of NVLink and Multi-GPU Scaling on Efficiency for Various Job Sizes in VASP GPU Edition.
Dear Zhao,
ad 1)
We strongly encourage the use of the NVIDIA Collective Communications Library (NCCL) when building VASP with openACC support. NCCL uses the fastest available path between the GPUs that need to communicate, so NVLINK is certainly going to make a difference due to its larger transfer speeds compared to PCIe.
How large that benefit is depends on the system you are running, the number of GPUs per node, the number of nodes, and the PCIe and NVLINK versions. There are so many variables that it is not possible to give you a specific number.
ad 2)
Again, it is not possible to give you specific numbers here, due to the many variables that play a role. As general rules, GPUs need a bit of time to "spin-up" each time a kernel is offloaded. If the execution time of that kernel is very low, because the problem is very small, performance will suffer. This might be even worse for the memory transfers back and forth between the host CPU and the GPU. So in general more demanding jobs will profit more from GPU acceleration, and for small jobs, performance overhead might lead to worse performance on CPU+GPU than on CPU alone.
I hope this answers your questions fully. Please let me know, so I can close the topic, or try to provide more help,
Michael
ad 1)
We strongly encourage the use of the NVIDIA Collective Communications Library (NCCL) when building VASP with openACC support. NCCL uses the fastest available path between the GPUs that need to communicate, so NVLINK is certainly going to make a difference due to its larger transfer speeds compared to PCIe.
How large that benefit is depends on the system you are running, the number of GPUs per node, the number of nodes, and the PCIe and NVLINK versions. There are so many variables that it is not possible to give you a specific number.
ad 2)
Again, it is not possible to give you specific numbers here, due to the many variables that play a role. As general rules, GPUs need a bit of time to "spin-up" each time a kernel is offloaded. If the execution time of that kernel is very low, because the problem is very small, performance will suffer. This might be even worse for the memory transfers back and forth between the host CPU and the GPU. So in general more demanding jobs will profit more from GPU acceleration, and for small jobs, performance overhead might lead to worse performance on CPU+GPU than on CPU alone.
I hope this answers your questions fully. Please let me know, so I can close the topic, or try to provide more help,
Michael
-
- Full Member
- Posts: 197
- Joined: Tue Oct 13, 2020 11:32 pm
Re: Impact of NVLink and Multi-GPU Scaling on Efficiency for Various Job Sizes in VASP GPU Edition.
Dear Michael,
Got it. Thank you very much for your valuable comments and explanations.
Regards,
Zhao
Got it. Thank you very much for your valuable comments and explanations.
Regards,
Zhao