CUDA + NVIDIA drivers

From WilliamsNet Wiki
Revision as of 15:19, 29 November 2020 by DrEdWilliams (talk | contribs)
Jump to navigation Jump to search

Installation

Download repo rpm from http://developer.nvidia.com/cuda-downloads (or copy the repo file over from another system)

It seems that the dependency for the linux kernel devel package has been lost ... and the driver install just silently fails

yum install kernel-devel

On regular compute nodes where the full CUDA libraries aren't needed:

yum install nvidia-driver-cuda

On a full workstation or where CUDA is needed:

yum install cuda

There is a yum plugin (for CentOS, anyway) that facilitates the installation and management of NVIDIA kernel modules:

yum -y install yum-plugin-nvidia

Read more at: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ixzz4zMULzF9o

Updating drivers

The above process installs the cuda repository for yum ... and while most of the time updates will happen automatically, there are some things to consider when updating.

Reinstalling

Occasionally you need to just uninstall and reinstall the drivers. Rather than uninstalling and reinstalling the whole works, you can just do one group:

sudo yum -y remove kmod-nvidia-dkms-latest
sudo yum -y install kmod-nvidia-dkms-latest
reboot

Unfortunately, you cannot use the 'yum reinstall' command -- that just overlays the package with a new copy from the .rpm file. DKMS sees that the module already exists (whether it works or not) and just does nothing. The reboot is needed for the kernel to reload the reinstalled drivers.

Kernel Updates

Most of the time, a driver update by itself will work fine ... as will a kernel update. When these happen at the same time, problems can occur. It is best for the kernel update to occur first -- even if you have to do it manually:

sudo yum -y update kernel kernel-devel

Then you need to reboot (sorry). This way, when the driver update is applied, it is applied to the current (new) kernel version:

sudo yum -y update kmod-nvidia-dkms-lastest

This will require another reboot to activate the new modules.

DKMS Issues

Sometimes the update process will not recognize that it can do an update for a particular kernel. In this case, you need to completely remove the driver, reboot, and reinstall. When it happens, it is usually in conjunction with a kernel update (see above) -- so ... do the kernel update after removing the drivers, reboot, and then reinstall the driver:

sudo yum -y remove kmod-nvidia-dkms-latest
<update kernel if needed>
reboot

sudo yum -y install kmod-nvidia-dkms-latest

Ignore any warnings about not finding the latest version of the kmod-nvidia-dkms package ... the process will create it.