[GH-ISSUE #90] [BUG] nvidia-ml-py-12.535.77 兼容性問題 #52

Closed
opened 2026-05-05 03:23:44 -06:00 by gitea-mirror · 3 comments
Owner

Originally created by @hui-zhao-1 on GitHub (Aug 17, 2023).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/90

Originally assigned to: @XuehaiPan on GitHub.

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

1.2.0

Operating system and version

Ubuntu 20.04.6 LTS (Focal Fossa)

NVIDIA driver version

535.86.10

NVIDIA-SMI

Thu Aug 17 16:23:29 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1660 Ti     Off | 00000000:01:00.0 Off |                  N/A |
| 27%   46C    P2              39W / 120W |     75MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    522690      C   ./nvitop-test                                72MiB |
+---------------------------------------------------------------------------------------+

Python environment

3.8.17 (default, Jul 5 2023, 21:04:15)
[GCC 11.2.0] linux
nvidia-ml-py==12.535.77
nvitop==1.2.0

Problem description

测试机器使用的 nvidia driver https://international.download.nvidia.com/tesla/535.86.10/NVIDIA-Linux-x86_64-535.86.10.run
在该 版本 driver 下,运行 nvitop 无法正常看到运行中的进程:
image
image

Steps to Reproduce

排查发现,nvitop 报了这个错误:
image

Traceback

ERROR: A FunctionNotFound error occurred while calling nvmlQuery(<function nvmlDeviceGetComputeRunningProcesses at 0x7f2b7be24670>, *args, **kwargs).
Please verify whether the `nvidia-ml-py` package is compatible with your NVIDIA driver version.
ERROR: A FunctionNotFound error occurred while calling nvmlQuery(<function nvmlDeviceGetGraphicsRunningProcesses at 0x7f2b7be24700>, *args, **kwargs).
Please verify whether the `nvidia-ml-py` package is compatible with your NVIDIA driver version.

Logs

No response

Expected behavior

No response

Additional context

No response

Originally created by @hui-zhao-1 on GitHub (Aug 17, 2023). Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/90 Originally assigned to: @XuehaiPan on GitHub. ### Required prerequisites - [X] I have read the documentation <https://nvitop.readthedocs.io>. - [X] I have searched the [Issue Tracker](https://github.com/XuehaiPan/nvitop/issues) that this hasn't already been reported. (comment there if it has.) - [X] I have tried the latest version of nvitop in a new isolated virtual environment. ### What version of nvitop are you using? 1.2.0 ### Operating system and version Ubuntu 20.04.6 LTS (Focal Fossa) ### NVIDIA driver version 535.86.10 ### NVIDIA-SMI ```text Thu Aug 17 16:23:29 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.86.10 Driver Version: 535.86.10 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GTX 1660 Ti Off | 00000000:01:00.0 Off | N/A | | 27% 46C P2 39W / 120W | 75MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 522690 C ./nvitop-test 72MiB | +---------------------------------------------------------------------------------------+ ``` ### Python environment 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0] linux nvidia-ml-py==12.535.77 nvitop==1.2.0 ### Problem description 测试机器使用的 nvidia driver https://international.download.nvidia.com/tesla/535.86.10/NVIDIA-Linux-x86_64-535.86.10.run 在该 版本 driver 下,运行 nvitop 无法正常看到运行中的进程: ![image](https://github.com/XuehaiPan/nvitop/assets/19888114/96c764f8-7fca-4925-b78a-c47662793884) ![image](https://github.com/XuehaiPan/nvitop/assets/19888114/b0c29ee1-d9ef-4a09-b574-0b8d0a20614b) ### Steps to Reproduce 排查发现,nvitop 报了这个错误: ![image](https://github.com/XuehaiPan/nvitop/assets/19888114/3ff9bc02-e5a0-4080-ab9f-5aef479c747f) ### Traceback ```pytb ERROR: A FunctionNotFound error occurred while calling nvmlQuery(<function nvmlDeviceGetComputeRunningProcesses at 0x7f2b7be24670>, *args, **kwargs). Please verify whether the `nvidia-ml-py` package is compatible with your NVIDIA driver version. ERROR: A FunctionNotFound error occurred while calling nvmlQuery(<function nvmlDeviceGetGraphicsRunningProcesses at 0x7f2b7be24700>, *args, **kwargs). Please verify whether the `nvidia-ml-py` package is compatible with your NVIDIA driver version. ``` ### Logs _No response_ ### Expected behavior _No response_ ### Additional context _No response_
gitea-mirror 2026-05-05 03:23:44 -06:00
Author
Owner

@XuehaiPan commented on GitHub (Aug 17, 2023):

Duplicate #88, would be fixed by #89.

<!-- gh-comment-id:1681864063 --> @XuehaiPan commented on GitHub (Aug 17, 2023): Duplicate #88, would be fixed by #89. - #88 - #89
Author
Owner

@hui-zhao-1 commented on GitHub (Aug 17, 2023):

排查日志,怀疑是 https://github.com/XuehaiPan/nvitop/blob/main/nvitop/api/libnvml.py line 590 的 __determine_get_running_processes_version_suffix() 这个方法有问题
我这边没有看懂这里为什么要通过 'nvmlDeviceGetConfComputeMemSizeInfo' 判断 版本号,所以 fork 代码把这个判断注释以后,解决了这个问题,参考:cc3ad6da51

<!-- gh-comment-id:1681865861 --> @hui-zhao-1 commented on GitHub (Aug 17, 2023): 排查日志,怀疑是 https://github.com/XuehaiPan/nvitop/blob/main/nvitop/api/libnvml.py line 590 的 __determine_get_running_processes_version_suffix() 这个方法有问题 我这边没有看懂这里为什么要通过 'nvmlDeviceGetConfComputeMemSizeInfo' 判断 版本号,所以 fork 代码把这个判断注释以后,解决了这个问题,参考:https://github.com/XuehaiPan/nvitop/commit/cc3ad6da513062cab1759267fb80a028d74c2f32
Author
Owner

@hui-zhao-1 commented on GitHub (Aug 17, 2023):

已经通过 pip3 install git+https://github.com/XuehaiPan/nvitop.git 验证 问题解决了

<!-- gh-comment-id:1681870276 --> @hui-zhao-1 commented on GitHub (Aug 17, 2023): 已经通过 pip3 install git+https://github.com/XuehaiPan/nvitop.git 验证 问题解决了
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#52
No description provided.