NVIDIA Docker Runtime

From WilliamsNet Wiki
Revision as of 03:57, 19 December 2020 by DrEdWilliams (talk | contribs) (Created page with "(originally from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) == CentOS 7/8 == <pre>sudo yum-config-manager --add-repo https:/...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

(originally from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker)

CentOS 7/8

sudo yum-config-manager --add-repo https://nvidia.github.io/nvidia-docker/centos7/x86_64/nvidia-docker.repo
sudo yum install -y nvidia-docker2

Debian 10

Setup the stable repository and the GPG key:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Note: To get access to experimental features such as CUDA on WSL or the new MIG capability on A100, you may want to add the experimental branch to the repository listing:

curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list

Install the nvidia-docker2 package (and dependencies) after updating the package listing:

sudo apt-get update
sudo apt-get install -y nvidia-docker2

Finishing the install

Restart the Docker daemon to complete the installation after setting the default runtime:

sudo systemctl restart docker

Test nvidia-smi with the latest official CUDA image

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

Test the GPU performance using a simple NVIDIA GPU Cloud container with the CUDA nbody sample program

docker run -it --rm nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -benchmark -fp64

Reloading Repository Certificates

Rather frequently, it seems, the NVIDIA folks invalidate their signing certificates for their repositories. When that happens, you just need to delete the certificates and let the 'yum' command reload them on the next update:

$(sed -n 's/releasever=//p' /etc/yum.conf)
${DIST:-$(. /etc/os-release; echo $VERSION_ID)}
sudo rpm -e gpg-pubkey-f796ecb0
sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/$DIST/nvidia-container-runtime/gpgdir --delete-key f796ecb0
sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/$DIST/libnvidia-container/gpgdir --delete-key f796ecb0
sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/$DIST/nvidia-docker/gpgdir --delete-key f796ecb0
sudo yum -y makecache