error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 22
- Joined: Tue May 16, 2023 11:14 am
error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
I compiled vasp.6.4.1 in a NVIDIA HPC-SDK container in a Nvidia DGX cluster with A100 GPU's.
The container was downloaded from the Nvidia NGC, its version is nvidia+nvhpc+23.5-devel-cuda_multi-ubuntu22.04
I used makefile.include.nvhpc_omp_acc.
The compilation inside the container completed successfully, but when I submit a job with Slurm using this container, I get an error :
/usr/local/vasp.6.4.1/bin/vasp_std: error while loading shared libraries: libqdmod.so.0: cannot open shared object file: No such file or directory
task 0: Exited with exit code 127
I observe that the library libqdmod.so.0 exists in the container :
# find /opt/nvidia/hpc_sdk/ -name libqdmod.so
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqdmod.so
and its path is in the $LD_LIBRARY_PATH
# echo $LD_LIBRARY_PATH
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/nvshmem/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/nccl/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/cuda/extras/CUPTI/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/cuda/lib64::
I added the library path to the slurm submit script
export LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib:$LD_LIBRARY_PATH
but the run failed with the same error.
Attached are the makefile.include, the slurm submit script and the output.txt error message.
Thank you, Amihai
The container was downloaded from the Nvidia NGC, its version is nvidia+nvhpc+23.5-devel-cuda_multi-ubuntu22.04
I used makefile.include.nvhpc_omp_acc.
The compilation inside the container completed successfully, but when I submit a job with Slurm using this container, I get an error :
/usr/local/vasp.6.4.1/bin/vasp_std: error while loading shared libraries: libqdmod.so.0: cannot open shared object file: No such file or directory
task 0: Exited with exit code 127
I observe that the library libqdmod.so.0 exists in the container :
# find /opt/nvidia/hpc_sdk/ -name libqdmod.so
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqdmod.so
and its path is in the $LD_LIBRARY_PATH
# echo $LD_LIBRARY_PATH
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/nvshmem/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/nccl/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/cuda/extras/CUPTI/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/cuda/lib64::
I added the library path to the slurm submit script
export LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib:$LD_LIBRARY_PATH
but the run failed with the same error.
Attached are the makefile.include, the slurm submit script and the output.txt error message.
Thank you, Amihai
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 216
- Joined: Fri Jul 01, 2022 2:17 pm
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Dear amihai_silverman1,
Could you please try the ldd command on your vasp executable.
The ldd command outputs the libraries you were linking against during
compilation and shows their respective path.
Please put the following command in your job script:
Maybe this already helps you to resolve your issue.
Otherwise please post the output and we will see how to proceed.
All the best Jonathan
Could you please try the ldd command on your vasp executable.
The ldd command outputs the libraries you were linking against during
compilation and shows their respective path.
Please put the following command in your job script:
Code: Select all
ldd /usr/local/vasp.6.4.1/bin/vasp_std
Otherwise please post the output and we will see how to proceed.
All the best Jonathan
-
- Newbie
- Posts: 22
- Joined: Tue May 16, 2023 11:14 am
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Thank you for the reply.
Typing ldd /usr/local/vasp.6.4.1/bin/vasp_std inside the container gives a list of libraries which exists in the container.
Typing ldd for vasp.6.4.1/bin/vasp_std outside the container show that most of the libraries are missing.
How do I tell the slurm script to use the libraries inside the container ?
I have tried may options, but can't get it right, I will be grateful for your help.
Thanks, Amihai
Typing ldd /usr/local/vasp.6.4.1/bin/vasp_std inside the container gives a list of libraries which exists in the container.
Typing ldd for vasp.6.4.1/bin/vasp_std outside the container show that most of the libraries are missing.
How do I tell the slurm script to use the libraries inside the container ?
I have tried may options, but can't get it right, I will be grateful for your help.
Thanks, Amihai
-
- Global Moderator
- Posts: 216
- Joined: Fri Jul 01, 2022 2:17 pm
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Dear amihai_silverman1,
I am sorry but with the information you are supplying I am not able to help you.
As already asked could you please run the ldd command in your
slum job script and post the output in the forum. According to the job script you sent you should modify it as follows:
After running the job script please post the file output_ldd.txt to the forum. I need this information to guide you further trough your problem.
With many thanks and kind regards
Jonathan
I am sorry but with the information you are supplying I am not able to help you.
As already asked could you please run the ldd command in your
slum job script and post the output in the forum. According to the job script you sent you should modify it as follows:
Code: Select all
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --gpus=1
#SBATCH --qos=basic
export OMPI_ALLOW_RUN_AS_ROOT=1
export LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib:$LD_LIBRARY_PATH
srun --container-image=/rg/spatari_prj/amihai/vasp/nvidia+nvhpc+vasp.sqsh --container-mounts=/rg/spatari_prj/amihai/vasp/NaCl:/home/NaCl --container-workdir=/home/NaCl ldd --allow-run-as-root /usr/local/vasp.6.4.1/bin/vasp_std >& output_ldd.txt
srun --container-image=/rg/spatari_prj/amihai/vasp/nvidia+nvhpc+vasp.sqsh --container-mounts=/rg/spatari_prj/amihai/vasp/NaCl:/home/NaCl --container-workdir=/home/NaCl mpirun -np 1 --allow-run-as-root /usr/local/vasp.6.4.1/bin/vasp_std >& output.txt
With many thanks and kind regards
Jonathan
-
- Newbie
- Posts: 22
- Joined: Tue May 16, 2023 11:14 am
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Hi Jonathan,
Attached are the output_ldd.txt and submit script.
I see there:
libqdmod.so.0 => not found
but in the container:
i# ls -l /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib
total 1656
-rw-r--r-- 1 root root 562410 May 23 21:20 libqd.a
-rw-r--r-- 1 root root 971 May 23 21:20 libqd.la
lrwxrwxrwx 1 root root 14 May 23 21:47 libqd.so -> libqd.so.0.0.0
lrwxrwxrwx 1 root root 14 May 23 21:47 libqd.so.0 -> libqd.so.0.0.0
-rw-r--r-- 1 root root 313800 May 23 21:20 libqd.so.0.0.0
-rw-r--r-- 1 root root 2982 May 23 21:20 libqd_f_main.a
-rw-r--r-- 1 root root 1020 May 23 21:20 libqd_f_main.la
lrwxrwxrwx 1 root root 21 May 23 21:47 libqd_f_main.so -> libqd_f_main.so.0.0.0
lrwxrwxrwx 1 root root 21 May 23 21:47 libqd_f_main.so.0 -> libqd_f_main.so.0.0.0
-rw-r--r-- 1 root root 9240 May 23 21:20 libqd_f_main.so.0.0.0
-rw-r--r-- 1 root root 579318 May 23 21:20 libqdmod.a
-rw-r--r-- 1 root root 992 May 23 21:20 libqdmod.la
lrwxrwxrwx 1 root root 17 May 23 21:47 libqdmod.so -> libqdmod.so.0.0.0
lrwxrwxrwx 1 root root 17 May 23 21:47 libqdmod.so.0 -> libqdmod.so.0.0.0
-rw-r--r-- 1 root root 200968 May 23 21:20 libqdmod.so.0.0.0
ldd in an interactive bash in the container gives a different result:
i# ldd /usr/local/vasp.6.4.1/bin/vasp_std
linux-vdso.so.1 (0x00007ffe07fb3000)
libqdmod.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqdmod.so.0 (0x00007fc0ba200000)
libqd.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqd.so.0 (0x00007fc0b9e00000)
liblapack_lp64.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/liblapack_lp64.so.0 (0x00007fc0b9000000)
libblas_lp64.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libblas_lp64.so.0 (0x00007fc0b7800000)
libfftw3.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3.so.3 (0x00007fc0b75e5000)
libfftw3_omp.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3_omp.so.3 (0x00007fc0ba476000)
libmpi_usempif08.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib/libmpi_usempif08.so.40 (0x00007fc0b7200000)
libmpi_usempi_ignore_tkr.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib/libmpi_usempi_ignore_tkr.so.40 (0x00007fc0b6e00000)
libmpi_mpifh.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib/libmpi_mpifh.so.40 (0x00007fc0b6a00000)
libmpi.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib/libmpi.so.40 (0x00007fc0b6600000)
libscalapack_lp64.so.2 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib/libscalapack_lp64.so.2 (0x00007fc0b5c00000)
libnvhpcwrapcufft.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvhpcwrapcufft.so (0x00007fc0b5800000)
libcufft.so.10 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/11.0/lib64/libcufft.so.10 (0x00007fc0ab800000)
libcusolver.so.10 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/11.0/lib64/libcusolver.so.10 (0x00007fc08a800000)
libcudaforwrapnccl.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudaforwrapnccl.so (0x00007fc08a400000)
libnccl.so.2 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/nccl/lib/libnccl.so.2 (0x00007fc07a000000)
libcublas.so.11 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/11.0/lib64/libcublas.so.11 (0x00007fc074000000)
libcublasLt.so.11 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/11.0/lib64/libcublasLt.so.11 (0x00007fc068e00000)
libcudaforwrapblas.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudaforwrapblas.so (0x00007fc068a00000)
libcudart.so.11.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/cuda/11.0/lib64/libcudart.so.11.0 (0x00007fc068600000)
libcudafor_110.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudafor_110.so (0x00007fc063a00000)
libcudafor.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudafor.so (0x00007fc063600000)
libacchost.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libacchost.so (0x00007fc063200000)
libaccdevaux110.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libaccdevaux110.so (0x00007fc062e00000)
libacccuda.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libacccuda.so (0x00007fc062a00000)
libcudadevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudadevice.so (0x00007fc062600000)
libcudafor2.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudafor2.so (0x00007fc062200000)
libnvf.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvf.so (0x00007fc061a00000)
libnvhpcatm.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvhpcatm.so (0x00007fc061600000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc0613d6000)
libnvomp.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvomp.so (0x00007fc060200000)
libnvcpumath.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvcpumath.so (0x00007fc05fc00000)
libnvc.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvc.so (0x00007fc05f800000)
libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fc05f5d8000)
libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc0ba44c000)
libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007fc0ba119000)
libatomic.so.1 => /usr/lib/x86_64-linux-gnu/libatomic.so.1 (0x00007fc0ba442000)
libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc0ba43d000)
libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc0ba436000)
libopen-rte.so.40 => /usr/lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007fc0ba05c000)
libopen-pal.so.40 => /usr/lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007fc0b9d4d000)
libutil.so.1 => /usr/lib/x86_64-linux-gnu/libutil.so.1 (0x00007fc0ba431000)
libz.so.1 => /usr/lib/x86_64-linux-gnu/libz.so.1 (0x00007fc0ba040000)
librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007fc0ba42a000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc0ba48a000)
libhwloc.so.15 => /usr/lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007fc0b9cf1000)
libevent_core-2.1.so.7 => /usr/lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x00007fc0b9cbc000)
libevent_pthreads-2.1.so.7 => /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007fc0ba423000)
libudev.so.1 => /usr/lib/x86_64-linux-gnu/libudev.so.1 (0x00007fc0b9c92000)
Thanks, Amihai
Attached are the output_ldd.txt and submit script.
I see there:
libqdmod.so.0 => not found
but in the container:
i# ls -l /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib
total 1656
-rw-r--r-- 1 root root 562410 May 23 21:20 libqd.a
-rw-r--r-- 1 root root 971 May 23 21:20 libqd.la
lrwxrwxrwx 1 root root 14 May 23 21:47 libqd.so -> libqd.so.0.0.0
lrwxrwxrwx 1 root root 14 May 23 21:47 libqd.so.0 -> libqd.so.0.0.0
-rw-r--r-- 1 root root 313800 May 23 21:20 libqd.so.0.0.0
-rw-r--r-- 1 root root 2982 May 23 21:20 libqd_f_main.a
-rw-r--r-- 1 root root 1020 May 23 21:20 libqd_f_main.la
lrwxrwxrwx 1 root root 21 May 23 21:47 libqd_f_main.so -> libqd_f_main.so.0.0.0
lrwxrwxrwx 1 root root 21 May 23 21:47 libqd_f_main.so.0 -> libqd_f_main.so.0.0.0
-rw-r--r-- 1 root root 9240 May 23 21:20 libqd_f_main.so.0.0.0
-rw-r--r-- 1 root root 579318 May 23 21:20 libqdmod.a
-rw-r--r-- 1 root root 992 May 23 21:20 libqdmod.la
lrwxrwxrwx 1 root root 17 May 23 21:47 libqdmod.so -> libqdmod.so.0.0.0
lrwxrwxrwx 1 root root 17 May 23 21:47 libqdmod.so.0 -> libqdmod.so.0.0.0
-rw-r--r-- 1 root root 200968 May 23 21:20 libqdmod.so.0.0.0
ldd in an interactive bash in the container gives a different result:
i# ldd /usr/local/vasp.6.4.1/bin/vasp_std
linux-vdso.so.1 (0x00007ffe07fb3000)
libqdmod.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqdmod.so.0 (0x00007fc0ba200000)
libqd.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqd.so.0 (0x00007fc0b9e00000)
liblapack_lp64.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/liblapack_lp64.so.0 (0x00007fc0b9000000)
libblas_lp64.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libblas_lp64.so.0 (0x00007fc0b7800000)
libfftw3.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3.so.3 (0x00007fc0b75e5000)
libfftw3_omp.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3_omp.so.3 (0x00007fc0ba476000)
libmpi_usempif08.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib/libmpi_usempif08.so.40 (0x00007fc0b7200000)
libmpi_usempi_ignore_tkr.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib/libmpi_usempi_ignore_tkr.so.40 (0x00007fc0b6e00000)
libmpi_mpifh.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib/libmpi_mpifh.so.40 (0x00007fc0b6a00000)
libmpi.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib/libmpi.so.40 (0x00007fc0b6600000)
libscalapack_lp64.so.2 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib/libscalapack_lp64.so.2 (0x00007fc0b5c00000)
libnvhpcwrapcufft.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvhpcwrapcufft.so (0x00007fc0b5800000)
libcufft.so.10 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/11.0/lib64/libcufft.so.10 (0x00007fc0ab800000)
libcusolver.so.10 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/11.0/lib64/libcusolver.so.10 (0x00007fc08a800000)
libcudaforwrapnccl.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudaforwrapnccl.so (0x00007fc08a400000)
libnccl.so.2 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/nccl/lib/libnccl.so.2 (0x00007fc07a000000)
libcublas.so.11 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/11.0/lib64/libcublas.so.11 (0x00007fc074000000)
libcublasLt.so.11 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/11.0/lib64/libcublasLt.so.11 (0x00007fc068e00000)
libcudaforwrapblas.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudaforwrapblas.so (0x00007fc068a00000)
libcudart.so.11.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/cuda/11.0/lib64/libcudart.so.11.0 (0x00007fc068600000)
libcudafor_110.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudafor_110.so (0x00007fc063a00000)
libcudafor.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudafor.so (0x00007fc063600000)
libacchost.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libacchost.so (0x00007fc063200000)
libaccdevaux110.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libaccdevaux110.so (0x00007fc062e00000)
libacccuda.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libacccuda.so (0x00007fc062a00000)
libcudadevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudadevice.so (0x00007fc062600000)
libcudafor2.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libcudafor2.so (0x00007fc062200000)
libnvf.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvf.so (0x00007fc061a00000)
libnvhpcatm.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvhpcatm.so (0x00007fc061600000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc0613d6000)
libnvomp.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvomp.so (0x00007fc060200000)
libnvcpumath.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvcpumath.so (0x00007fc05fc00000)
libnvc.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib/libnvc.so (0x00007fc05f800000)
libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fc05f5d8000)
libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc0ba44c000)
libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007fc0ba119000)
libatomic.so.1 => /usr/lib/x86_64-linux-gnu/libatomic.so.1 (0x00007fc0ba442000)
libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc0ba43d000)
libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc0ba436000)
libopen-rte.so.40 => /usr/lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007fc0ba05c000)
libopen-pal.so.40 => /usr/lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007fc0b9d4d000)
libutil.so.1 => /usr/lib/x86_64-linux-gnu/libutil.so.1 (0x00007fc0ba431000)
libz.so.1 => /usr/lib/x86_64-linux-gnu/libz.so.1 (0x00007fc0ba040000)
librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007fc0ba42a000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc0ba48a000)
libhwloc.so.15 => /usr/lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007fc0b9cf1000)
libevent_core-2.1.so.7 => /usr/lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x00007fc0b9cbc000)
libevent_pthreads-2.1.so.7 => /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007fc0ba423000)
libudev.so.1 => /usr/lib/x86_64-linux-gnu/libudev.so.1 (0x00007fc0b9c92000)
Thanks, Amihai
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 216
- Joined: Fri Jul 01, 2022 2:17 pm
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Dear amihai_silverman1,
Thank you for submitting the output of the ldd command.
But I fear there is not much I can do for you. From the output in your interactive bash
shell there are no issues visible, because the libqdmod.so.0 is found, indicated by the following lines
Note here that both of this files are in the folder:
When you are executing the vasp code with a slurm job script you got the output:
And when running the ldd command from within your job script, you got:
My guess is now that you have access rights to the folder
from your interactive shell. But it seems you don't have access to the same folder within your job script. Therefore the discrepancy between the output in the job script and the interactive shell. I would recommend to talk to your system administrator and show the information you already gathered to him/her.
I am sorry that I can not be of more help.
All the best Jonathan
Thank you for submitting the output of the ldd command.
But I fear there is not much I can do for you. From the output in your interactive bash
shell there are no issues visible, because the libqdmod.so.0 is found, indicated by the following lines
Code: Select all
libqdmod.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqdmod.so.0 (0x00007fc0ba200000)
libqd.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqd.so.0 (0x00007fc0b9e00000)
Code: Select all
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras
Code: Select all
error while loading shared libraries: libqdmod.so.0: cannot open shared object file: No such file or directory
Code: Select all
libqdmod.so.0 => not found
libqd.so.0 => not found
Code: Select all
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras
I am sorry that I can not be of more help.
All the best Jonathan
-
- Newbie
- Posts: 22
- Joined: Tue May 16, 2023 11:14 am
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Thank you
In there a way to compile vasp using the static libraries libqdmod and libqd ?
This may solve the problem since only these two are missing.
Amihai
In there a way to compile vasp using the static libraries libqdmod and libqd ?
This may solve the problem since only these two are missing.
Amihai
-
- Global Moderator
- Posts: 216
- Joined: Fri Jul 01, 2022 2:17 pm
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Dear amihai_silverman1,
I would strongly discourage you from compiling the library yourself. This would mean that you have to compile
the whole NVIDIA-HPC package. I would strongly recommend to just talk to your system administrator why you do not have
access to the folder /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras or one of its sub folders when accessing it from your slurm script.
Another possibility I can think of is copy the files
when in interactive shell to some other location.
Note the files have to have the same name. So don't change the name libqdmod.so.0 into something else when copying.
Then export the path where you copied your files to in your slurm script.
As an example
I hope this is of help.
All the best Jonathan
I would strongly discourage you from compiling the library yourself. This would mean that you have to compile
the whole NVIDIA-HPC package. I would strongly recommend to just talk to your system administrator why you do not have
access to the folder /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras or one of its sub folders when accessing it from your slurm script.
Another possibility I can think of is copy the files
Code: Select all
libqdmod.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqdmod.so.0
libqd.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqd.so.0
Note the files have to have the same name. So don't change the name libqdmod.so.0 into something else when copying.
Then export the path where you copied your files to in your slurm script.
As an example
Code: Select all
export LD_LIBRARY_PATH=/ABSOLUTE_PATH_WHERE_YOU_COPIED_libqdmod.so_TO/:$LD_LIBRARY_PATH
All the best Jonathan
-
- Newbie
- Posts: 22
- Joined: Tue May 16, 2023 11:14 am
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Hello,
I managed to solve this problem by setting makefile.include to use the static libraries qd and qdmod. I put in makefile.include the lines :
QD ?= $(NVROOT)/compilers/extras/qd
LLIBS += -L/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib -Wl,-Bstatic -lqdmod -lqd -Wl,-Bdynamic
INCS += -I$(QD)/include/qd
Now vasp runs but gives an error.
Even when I try the simple H2O example from the tutorial and run it interactively and I get the following error :
# /usr/local/vasp.6.4.1/bin/vasp_std
running 1 mpi-ranks, with 8 threads/rank, on 1 nodes
distrk: each k-point on 1 cores, 1 groups
distr: one band on 1 cores, 1 groups
OpenACC runtime initialized ... 1 GPUs detected
Warning: ieee_invalid is signaling
Warning: ieee_divide_by_zero is signaling
Warning: ieee_inexact is signaling
1
Tnx, Amihai
I managed to solve this problem by setting makefile.include to use the static libraries qd and qdmod. I put in makefile.include the lines :
QD ?= $(NVROOT)/compilers/extras/qd
LLIBS += -L/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib -Wl,-Bstatic -lqdmod -lqd -Wl,-Bdynamic
INCS += -I$(QD)/include/qd
Now vasp runs but gives an error.
Even when I try the simple H2O example from the tutorial and run it interactively and I get the following error :
# /usr/local/vasp.6.4.1/bin/vasp_std
running 1 mpi-ranks, with 8 threads/rank, on 1 nodes
distrk: each k-point on 1 cores, 1 groups
distr: one band on 1 cores, 1 groups
OpenACC runtime initialized ... 1 GPUs detected
Code: Select all
-----------------------------------------------------------------------------
| _ ____ _ _ _____ _ |
| | | | _ \ | | | | / ____| | | |
| | | | |_) | | | | | | | __ | | |
| |_| | _ < | | | | | | |_ | |_| |
| _ | |_) | | |__| | | |__| | _ |
| (_) |____/ \____/ \_____| (_) |
| |
| internal error in: mpi.F at line: 898 |
| |
| M_init_nccl: Error in ncclCommInitRank |
| |
| If you are not a developer, you should not encounter this problem. |
| Please submit a bug report. |
| |
-----------------------------------------------------------------------------
Warning: ieee_divide_by_zero is signaling
Warning: ieee_inexact is signaling
1
Tnx, Amihai
-
- Newbie
- Posts: 22
- Joined: Tue May 16, 2023 11:14 am
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
I submitted the last problem to the Bugreports forum.
-
- Global Moderator
- Posts: 216
- Joined: Fri Jul 01, 2022 2:17 pm
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Dear amihai_silverman1,
The bug that your are reporting is a result of not compiling vasp properly.
As suggested in my previous post you could try to copy the files
libqdmod.so.0, libqd.so.0 to some location that can be assessed from your
slurm job script.
All the best
Jonathan
The bug that your are reporting is a result of not compiling vasp properly.
As suggested in my previous post you could try to copy the files
libqdmod.so.0, libqd.so.0 to some location that can be assessed from your
slurm job script.
All the best
Jonathan
-
- Newbie
- Posts: 22
- Joined: Tue May 16, 2023 11:14 am
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Hi,
previously I did exactly that, but it didn't work.
The compilation finds the other libraries in /usr/lib/x86_64-linux-gnu, but it doesn't find libqdmod. I can't understand that :
cp -p /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/* /usr/lib/x86_64-linux-gnu
cp -rp /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/include/qd /usr/include/x86_64-linux-gnu
ls -l /usr/lib/x86_64-linux-gnu/libqd*
-rw-r--r-- 1 root root 971 May 23 21:20 /usr/lib/x86_64-linux-gnu/libqd.la
lrwxrwxrwx 1 root root 14 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqd.so.0 -> libqd.so.0.0.0
-rw-r--r-- 1 root root 191152 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqd.so.0.0.0
-rw-r--r-- 1 root root 1020 May 23 21:20 /usr/lib/x86_64-linux-gnu/libqd_f_main.la
lrwxrwxrwx 1 root root 21 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqd_f_main.so.0 -> libqd_f_main.so.0.0.0
-rw-r--r-- 1 root root 14360 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqd_f_main.so.0.0.0
-rw-r--r-- 1 root root 992 May 23 21:20 /usr/lib/x86_64-linux-gnu/libqdmod.la
lrwxrwxrwx 1 root root 17 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqdmod.so.0 -> libqdmod.so.0.0.0
-rw-r--r-- 1 root root 154640 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqdmod.so.0.0.0
The compilation fails :
mpif90 -acc -gpu=cc60,cc70,cc80,cuda11.0 -mp -c++libs -o vasp c2f_interface.o nccl2for.o simd.o base.o profiling.o string.o tutor.o version.o build_info.o command_line.o vhdf5_base.o incar_reader.o reader_base.o openmp.o openacc_struct.o mpi.o mpi_shmem.o mathtools.o bse_struct.o hamil_struct.o radial_struct.o pseudo_struct.o mgrid_struct.o wave_struct.o nl_struct.o mkpoints_struct.o poscar_struct.o afqmc_struct.o fock_glb.o chi_glb.o smart_allocate.o xml.o extpot_glb.o constant.o ml_ff_c2f_interface.o ml_ff_prec.o ml_ff_string.o ml_ff_tutor.o ml_ff_constant.o ml_ff_mpi_help.o ml_ff_neighbor.o ml_ff_taglist.o ml_ff_struct.o ml_ff_mpi_shmem.o vdwforcefield_glb.o jacobi.o main_mpi.o openacc.o scala.o asa.o lattice.o poscar.o ini.o mgrid.o ml_asa2.o ml_ff_mpi.o ml_ff_helper.o ml_ff_logfile.o ml_ff_math.o ml_ff_iohandle.o ml_ff_memory.o ml_ff_abinitio.o ml_ff_ff2.o ml_ff_ff.o ml_ff_mlff.o setex_struct.o xclib.o vdw_nl.o xclib_grad.o setex.o radial.o pseudo.o gridq.o ebs.o symlib.o mkpoints.o random.o wave.o wave_mpi.o wave_high.o bext.o spinsym.o symmetry.o lattlib.o nonl.o nonlr.o nonl_high.o dfast.o choleski2.o mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o constrmag.o cl_shift.o relativistic.o LDApU.o paw_base.o metagga.o egrad.o pawsym.o pawfock.o pawlhf.o diis.o rhfatm.o hyperfine.o fock_ace.o paw.o mkpoints_full.o charge.o Lebedev-Laikov.o stockholder.o dipol.o solvation.o scpc.o pot.o fermi_energy.o tet.o dos.o elf.o hamil_rot.o bfgs.o dynmat.o instanton.o lbfgs.o sd.o cg.o dimer.o bbm.o fire.o lanczos.o neb.o qm.o pyamff_fortran/*.o ml_pyamff.o opt.o chain.o dyna.o fileio.o vhdf5.o sphpro.o us.o core_rel.o aedens.o wavpre.o wavpre_noio.o broyden.o dynbr.o reader.o writer.o xml_writer.o brent.o stufak.o opergrid.o stepver.o fast_aug.o fock_multipole.o fock.o fock_dbl.o fock_frc.o mkpoints_change.o subrot_cluster.o sym_grad.o mymath.o npt_dynamics.o subdftd3.o subdftd4.o internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o nmr.o pead.o k-proj.o subrot.o subrot_scf.o paircorrection.o rpa_force.o ml_reader.o ml_interface.o force.o pwlhf.o gw_model.o optreal.o steep.o rmm-diis.o davidson.o david_inner.o root_find.o lcao_bare.o locproj.o electron_common.o electron.o rot.o electron_all.o shm.o pardens.o optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o hamil_lr.o rmm-diis_lr.o subrot_lr.o lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o linear_optics.o setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o gauss_quad.o m_unirnk.o minimax_ini.o minimax_dependence.o minimax_functions1D.o minimax_functions2D.o minimax_struct.o minimax_varpro.o minimax.o umco.o mlwf.o ratpol.o pade_fit.o screened_2e.o wave_cacher.o crpa.o chi_base.o wpot.o local_field.o ump2.o ump2kpar.o fcidump.o ump2no.o bse_te.o bse.o time_propagation.o acfdt.o afqmc.o rpax.o chi.o acfdt_GG.o dmft.o GG_base.o greens_orbital.o lt_mp2.o rnd_orb_mp2.o greens_real_space.o chi_GG.o chi_super.o sydmat.o rmm-diis_mlr.o linear_response_NMR.o wannier_interpol.o wave_interpolate.o linear_response.o auger.o dmatrix.o phonon.o wannier_mats.o elphon.o core_con_mat.o embed.o extpot.o rpa_high.o fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o main.o -Llib -ldmy -Lparser -lparser -cudalib=cublas,cusolver,cufft,nccl -cuda -L/usr/lib/x86_64-linux-gnu -lfftw3 -lfftw3_omp -Mscalapack -llapack -lblas -lqdmod -lqd
/usr/bin/ld: cannot find -lqdmod: No such file or directory
/usr/bin/ld: cannot find -lqd: No such file or directory
pgacclnk: child process exit status 1: /usr/bin/ld
make[2]: *** [makefile:132: vasp] Error 2
previously I did exactly that, but it didn't work.
The compilation finds the other libraries in /usr/lib/x86_64-linux-gnu, but it doesn't find libqdmod. I can't understand that :
cp -p /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/* /usr/lib/x86_64-linux-gnu
cp -rp /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/include/qd /usr/include/x86_64-linux-gnu
ls -l /usr/lib/x86_64-linux-gnu/libqd*
-rw-r--r-- 1 root root 971 May 23 21:20 /usr/lib/x86_64-linux-gnu/libqd.la
lrwxrwxrwx 1 root root 14 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqd.so.0 -> libqd.so.0.0.0
-rw-r--r-- 1 root root 191152 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqd.so.0.0.0
-rw-r--r-- 1 root root 1020 May 23 21:20 /usr/lib/x86_64-linux-gnu/libqd_f_main.la
lrwxrwxrwx 1 root root 21 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqd_f_main.so.0 -> libqd_f_main.so.0.0.0
-rw-r--r-- 1 root root 14360 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqd_f_main.so.0.0.0
-rw-r--r-- 1 root root 992 May 23 21:20 /usr/lib/x86_64-linux-gnu/libqdmod.la
lrwxrwxrwx 1 root root 17 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqdmod.so.0 -> libqdmod.so.0.0.0
-rw-r--r-- 1 root root 154640 Feb 20 2022 /usr/lib/x86_64-linux-gnu/libqdmod.so.0.0.0
The compilation fails :
mpif90 -acc -gpu=cc60,cc70,cc80,cuda11.0 -mp -c++libs -o vasp c2f_interface.o nccl2for.o simd.o base.o profiling.o string.o tutor.o version.o build_info.o command_line.o vhdf5_base.o incar_reader.o reader_base.o openmp.o openacc_struct.o mpi.o mpi_shmem.o mathtools.o bse_struct.o hamil_struct.o radial_struct.o pseudo_struct.o mgrid_struct.o wave_struct.o nl_struct.o mkpoints_struct.o poscar_struct.o afqmc_struct.o fock_glb.o chi_glb.o smart_allocate.o xml.o extpot_glb.o constant.o ml_ff_c2f_interface.o ml_ff_prec.o ml_ff_string.o ml_ff_tutor.o ml_ff_constant.o ml_ff_mpi_help.o ml_ff_neighbor.o ml_ff_taglist.o ml_ff_struct.o ml_ff_mpi_shmem.o vdwforcefield_glb.o jacobi.o main_mpi.o openacc.o scala.o asa.o lattice.o poscar.o ini.o mgrid.o ml_asa2.o ml_ff_mpi.o ml_ff_helper.o ml_ff_logfile.o ml_ff_math.o ml_ff_iohandle.o ml_ff_memory.o ml_ff_abinitio.o ml_ff_ff2.o ml_ff_ff.o ml_ff_mlff.o setex_struct.o xclib.o vdw_nl.o xclib_grad.o setex.o radial.o pseudo.o gridq.o ebs.o symlib.o mkpoints.o random.o wave.o wave_mpi.o wave_high.o bext.o spinsym.o symmetry.o lattlib.o nonl.o nonlr.o nonl_high.o dfast.o choleski2.o mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o constrmag.o cl_shift.o relativistic.o LDApU.o paw_base.o metagga.o egrad.o pawsym.o pawfock.o pawlhf.o diis.o rhfatm.o hyperfine.o fock_ace.o paw.o mkpoints_full.o charge.o Lebedev-Laikov.o stockholder.o dipol.o solvation.o scpc.o pot.o fermi_energy.o tet.o dos.o elf.o hamil_rot.o bfgs.o dynmat.o instanton.o lbfgs.o sd.o cg.o dimer.o bbm.o fire.o lanczos.o neb.o qm.o pyamff_fortran/*.o ml_pyamff.o opt.o chain.o dyna.o fileio.o vhdf5.o sphpro.o us.o core_rel.o aedens.o wavpre.o wavpre_noio.o broyden.o dynbr.o reader.o writer.o xml_writer.o brent.o stufak.o opergrid.o stepver.o fast_aug.o fock_multipole.o fock.o fock_dbl.o fock_frc.o mkpoints_change.o subrot_cluster.o sym_grad.o mymath.o npt_dynamics.o subdftd3.o subdftd4.o internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o nmr.o pead.o k-proj.o subrot.o subrot_scf.o paircorrection.o rpa_force.o ml_reader.o ml_interface.o force.o pwlhf.o gw_model.o optreal.o steep.o rmm-diis.o davidson.o david_inner.o root_find.o lcao_bare.o locproj.o electron_common.o electron.o rot.o electron_all.o shm.o pardens.o optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o hamil_lr.o rmm-diis_lr.o subrot_lr.o lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o linear_optics.o setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o gauss_quad.o m_unirnk.o minimax_ini.o minimax_dependence.o minimax_functions1D.o minimax_functions2D.o minimax_struct.o minimax_varpro.o minimax.o umco.o mlwf.o ratpol.o pade_fit.o screened_2e.o wave_cacher.o crpa.o chi_base.o wpot.o local_field.o ump2.o ump2kpar.o fcidump.o ump2no.o bse_te.o bse.o time_propagation.o acfdt.o afqmc.o rpax.o chi.o acfdt_GG.o dmft.o GG_base.o greens_orbital.o lt_mp2.o rnd_orb_mp2.o greens_real_space.o chi_GG.o chi_super.o sydmat.o rmm-diis_mlr.o linear_response_NMR.o wannier_interpol.o wave_interpolate.o linear_response.o auger.o dmatrix.o phonon.o wannier_mats.o elphon.o core_con_mat.o embed.o extpot.o rpa_high.o fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o main.o -Llib -ldmy -Lparser -lparser -cudalib=cublas,cusolver,cufft,nccl -cuda -L/usr/lib/x86_64-linux-gnu -lfftw3 -lfftw3_omp -Mscalapack -llapack -lblas -lqdmod -lqd
/usr/bin/ld: cannot find -lqdmod: No such file or directory
/usr/bin/ld: cannot find -lqd: No such file or directory
pgacclnk: child process exit status 1: /usr/bin/ld
make[2]: *** [makefile:132: vasp] Error 2
-
- Global Moderator
- Posts: 216
- Joined: Fri Jul 01, 2022 2:17 pm
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Dear amihai_silverman1,
Did you export the LD_LIBRARY_PATH to the nvidia files before compiling. It think this is what the
compiler is telling you.
In principle you already had a compiled vasp version in your first post with dynamic linking.
You only had a problem when executing the code because the two files
where not accessible from your job script.
So I recommended you to either talk to you system administrator to get access to this folder from your job script.
The other possibility was to copy the two files
to some location where you have access to and export this path in your slurm job script.
Could you please try this. Because then you would not have to recompile vasp again.
I hope this works. If it does not please contact us again and send again the std output of the slurm job script.
And the output ldd command.
All the best Jonathan
Did you export the LD_LIBRARY_PATH to the nvidia files before compiling. It think this is what the
compiler is telling you.
In principle you already had a compiled vasp version in your first post with dynamic linking.
You only had a problem when executing the code because the two files
Code: Select all
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqdmod.so.0
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqd.so.0
So I recommended you to either talk to you system administrator to get access to this folder from your job script.
The other possibility was to copy the two files
Code: Select all
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqdmod.so.0
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib/libqd.so.0
Code: Select all
export LD_LIBRARY_PATH=/ABSOLUTE_PATH_WHERE_YOU_COPIED_libqdmod.so_TO/:$LD_LIBRARY_PATH
I hope this works. If it does not please contact us again and send again the std output of the slurm job script.
And the output ldd command.
Code: Select all
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --gpus=1
#SBATCH --qos=basic
export OMPI_ALLOW_RUN_AS_ROOT=1
export LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/ABSOLUTE_PATH_WHERE_YOU_COPIED_libqdmod.so_TO/:$LD_LIBRARY_PATH
srun --container-image=/rg/spatari_prj/amihai/vasp/nvidia+nvhpc+vasp.sqsh --container-mounts=/rg/spatari_prj/amihai/vasp/NaCl:/home/NaCl --container-workdir=/home/NaCl ldd --allow-run-as-root /usr/local/vasp.6.4.1/bin/vasp_std >& output_ldd.txt
srun --container-image=/rg/spatari_prj/amihai/vasp/nvidia+nvhpc+vasp.sqsh --container-mounts=/rg/spatari_prj/amihai/vasp/NaCl:/home/NaCl --container-workdir=/home/NaCl mpirun -np 1 --allow-run-as-root /usr/local/vasp.6.4.1/bin/vasp_std >& output.txt
-
- Newbie
- Posts: 22
- Joined: Tue May 16, 2023 11:14 am
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
Hi, you are correct.
I started from the beginning, downloaded vasp.6.4.1, and put it in a container nvidia+nvhpc+23.5-devel-cuda_multi-ubuntu22.04.sqsh as it was downloaded from Nvidia.
I installed libfftw3-3, used makefile.include.nvhpc_acc, and put there the fftw lib path.
The compilation run with no errors, but not when I run the H2O example I get :
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[15360,1],0]
Exit code: 1
--------------------------------------------------------------------------
Thank you for your help,
Amihai
I started from the beginning, downloaded vasp.6.4.1, and put it in a container nvidia+nvhpc+23.5-devel-cuda_multi-ubuntu22.04.sqsh as it was downloaded from Nvidia.
I installed libfftw3-3, used makefile.include.nvhpc_acc, and put there the fftw lib path.
The compilation run with no errors, but not when I run the H2O example I get :
Code: Select all
/H2O# mpirun --allow-run-as-root -np 1 /usr/local/vasp.6.4.1/bin/vasp_std
running 1 mpi-ranks, on 1 nodes
distrk: each k-point on 1 cores, 1 groups
distr: one band on 1 cores, 1 groups
OpenACC runtime initialized ... 1 GPUs detected
-----------------------------------------------------------------------------
| _ ____ _ _ _____ _ |
| | | | _ \ | | | | / ____| | | |
| | | | |_) | | | | | | | __ | | |
| |_| | _ < | | | | | | |_ | |_| |
| _ | |_) | | |__| | | |__| | _ |
| (_) |____/ \____/ \_____| (_) |
| |
| internal error in: mpi.F at line: 898 |
| |
| M_init_nccl: Error in ncclCommInitRank |
| |
| If you are not a developer, you should not encounter this problem. |
| Please submit a bug report. |
| |
-----------------------------------------------------------------------------
Warning: ieee_inexact is signaling
1
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
the job to be terminated. The first process to do so was:
Process name: [[15360,1],0]
Exit code: 1
--------------------------------------------------------------------------
Thank you for your help,
Amihai
-
- Newbie
- Posts: 22
- Joined: Tue May 16, 2023 11:14 am
Re: error - shared libraries: libqdmod.so.0 in NVIDIA HPC-SDK container
One more comment :
Previously I have compiled the same code in cpu HPC cluster using the oneapi Intel compilers. This runs properly.
Since we need more compute power, I try now to compile the same code on a Nvidia DGX cluster inside a hpc-sdk container, as was recommended in the vasp installation instructions. The compilation completes with no errors but running an example gives an error.
Maybe there is some inconsistency between this code and the compilers provided by Nvidia in the latest hpc-sdk version 23.5-devel-cuda_multi-ubuntu22.04 ?
Previously I have compiled the same code in cpu HPC cluster using the oneapi Intel compilers. This runs properly.
Since we need more compute power, I try now to compile the same code on a Nvidia DGX cluster inside a hpc-sdk container, as was recommended in the vasp installation instructions. The compilation completes with no errors but running an example gives an error.
Maybe there is some inconsistency between this code and the compilers provided by Nvidia in the latest hpc-sdk version 23.5-devel-cuda_multi-ubuntu22.04 ?