NVIDIA驱动失效简单解决方案:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver.
第一步,打开终端,先用nvidia-smi查看一下,发现如下报错:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.
Make sure that the latest NVIDIA driver is installed and running.
第二步,使用nvcc -V检查驱动和cuda。
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
发现驱动是存在的,于是进行下一步
第三步,查看已安装驱动的版本信息
ls /usr/src | grep nvidia
比如我的驱动版本是:nvidia-450.57
第四步,依次输入以下命令
sudo apt-get install dkms
# 把驱动注册进入内核
sudo dkms install -m nvidia -v 450.57
等待安装完成后,再次输入nvidia-smi,查看GPU使用状态:

root@AI-03:/home/work/cluster# ls /usr/src | grep nvidia
nvidia-495.29.05
root@AI-03:/home/work/cluster# sudo dkms install -m nvidia -v 495.29.05
Creating symlink /var/lib/dkms/nvidia/495.29.05/source ->
/usr/src/nvidia-495.29.05
DKMS: add completed.
Kernel preparation unnecessary for this kernel. Skipping...
Building module:
cleaning build area...
'make' -j32 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-120-generic IGNORE_CC_MISMATCH='1' modules........
Signing module:
- /var/lib/dkms/nvidia/495.29.05/5.4.0-120-generic/x86_64/module/nvidia.ko
- /var/lib/dkms/nvidia/495.29.05/5.4.0-120-generic/x86_64/module/nvidia-drm.ko
- /var/lib/dkms/nvidia/495.29.05/5.4.0-120-generic/x86_64/module/nvidia-uvm.ko
- /var/lib/dkms/nvidia/495.29.05/5.4.0-120-generic/x86_64/module/nvidia-peermem.ko
- /var/lib/dkms/nvidia/495.29.05/5.4.0-120-generic/x86_64/module/nvidia-modeset.ko
Secure Boot not enabled on this system.
cleaning build area...
DKMS: build completed.
nvidia.ko:
Running module version sanity check.
Good news! Module version 495.29.05 for nvidia.ko
exactly matches what is already found in kernel 5.4.0-120-generic.
DKMS will not replace this module.
You may override by specifying --force.
nvidia-uvm.ko:
Running module version sanity check.
Good news! Module version for nvidia-uvm.ko
exactly matches what is already found in kernel 5.4.0-120-generic.
DKMS will not replace this module.
You may override by specifying --force.
nvidia-modeset.ko:
Running module version sanity check.
Good news! Module version 495.29.05 for nvidia-modeset.ko
exactly matches what is already found in kernel 5.4.0-120-generic.
DKMS will not replace this module.
You may override by specifying --force.
nvidia-drm.ko:
Running module version sanity check.
Good news! Module version 495.29.05 for nvidia-drm.ko
exactly matches what is already found in kernel 5.4.0-120-generic.
DKMS will not replace this module.
You may override by specifying --force.
nvidia-peermem.ko:
Running module version sanity check.
Good news! Module version 495.29.05 for nvidia-peermem.ko
exactly matches what is already found in kernel 5.4.0-120-generic.
DKMS will not replace this module.
You may override by specifying --force.
depmod........
DKMS: install completed.
其他
dkms加载报错
ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 418.181.07 -k 4.4.0-151-generic`:
Kernel preparation unnecessary for this kernel. Skipping...
Building module:
cleaning build area....
'make' -j4 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=4.15.0-50-generic IGNORE_CC_MISMATCH=''
modules....(bad exit status: 2)
ERROR (dkms apport): binary package for nvidia: 418.181.07 not found
Error! Bad return status for module build on kernel: 4.4.0-151-generic (x86_64)
Consult /var/lib/dkms/nvidia/418.181.07/build/make.log for more information.
$sudo update-alternatives --config gcc

选择相应需要切换的gcc版本即可
版本配置和切换
将不同版本的 python 装载进 update-alternatives
装载的基本语法:
sudo update-alternatives --install <链接> <名称> <路径> <优先级>
查看有哪些版本,再配置
ls /usr/bin/gcc*
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 70
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 90
ls /usr/bin/g++*
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 70
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 90
本文介绍了当NVIDIA驱动在Linux中失效时的解决步骤,包括使用nvidia-smi和nvcc -V检查问题,查看已安装驱动版本,通过dkms安装更新驱动,并详细说明了如何进行版本配置和切换来确保驱动正常运行。

2525

被折叠的 条评论
为什么被折叠?



