mirror of
https://github.com/XuehaiPan/nvitop.git
synced 2026-05-15 14:15:55 -06:00
[GH-ISSUE #88] [Bug] Processes information cannot be obtained normally on 535.98 driver #54
Labels
No labels
api
bug
bug
cli / tui
dependencies
documentation
documentation
documentation
duplicate
enhancement
exporter
invalid
pull-request
pynvml
question
question
upstream
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/nvitop#54
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @GeekRaw on GitHub (Aug 15, 2023).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/88
Originally assigned to: @XuehaiPan on GitHub.
Required prerequisites
Questions
Hello, when I use nvitop on the server, I can't get the Processes information normally, thank you for your answer
@XuehaiPan commented on GitHub (Aug 15, 2023):
Hi @GeekRaw, could you provide some relevant information, such as
nvidia-smioutput and the package version list of your Python environment? It would also be helpful whether you are runningnvitopnatively or in a container-like environment. Then we can investigate this issue deeper.@cfroehli commented on GitHub (Aug 16, 2023):
Hello,
If that may help, we noticed the same behavior recently too as we upgraded our drivers version (currently on 536.86.10, Ubuntu 20.04 cuda 12.2). Card model seems not relevant. The load and chart on the top is matching the nvidia-smi output, but the process list is broken. nvidia-smi is able to show the actual processes. Install is a basic python3 venv on the actual server, no container involved.
Depending on the tty refresh/timing, it is possible to see an
ERROR: A FunctionNotFound error occured while calling nvmlQuery(<function nvmlDeviceGetGraphicsRunningProcesses at 0x7f08ff962940>, *args, **kwargs). Please verify whether the nvidia-ml-py package is compatible with your NVIDIA driver versiongetting printed (often get overwritten so easy to miss). Guess some api changed in a recent nvidia-ml version.Downgrading it to some of the latest 11.* version didn't help.
Seems the _v3 is not there anymore but the python bindings keep using it ?
@XuehaiPan commented on GitHub (Aug 16, 2023):
@cfroehli Thanks for the feedback! This is due to poor version management for the NVML library.
The v3 APIs were introduced in the 510.39.01 driver:
b2f0e7f437but they were removed in the 535.98 driver:
0cb3beffa0Version change:
495.46 -> 510.39.01:
b2f0e7f437Add process info v3 APIs but use v2
nvmlProcessInfo_ststruct typedefault:
nvmlDeviceGetComputeRunningProcesses->nvmlDeviceGetComputeRunningProcesses_v3nvmlProcessInfo_st->nvmlProcessInfo_v2_st530.41.03 -> 535.43.02:
39c3e28e84Process info v3 APIs use v3
nvmlProcessInfo_ststruct type without a version bumpdefault:
nvmlDeviceGetComputeRunningProcesses->nvmlDeviceGetComputeRunningProcesses_v3nvmlProcessInfo_st->nvmlProcessInfo_v3_st535.86.05 -> 535.98:
0cb3beffa0Remove process info v3 APIs and v3
nvmlProcessInfo_ststruct typedefault:
nvmlDeviceGetComputeRunningProcesses->nvmlDeviceGetComputeRunningProcesses_v2nvmlProcessInfo_st->nvmlProcessInfo_v2_stUPDATE:
535.98 -> 535.104.05:
74cae7fa6aRe-add process info v3 APIs but use v2
nvmlProcessInfo_ststruct typedefault:
nvmlDeviceGetComputeRunningProcesses->nvmlDeviceGetComputeRunningProcesses_v3nvmlProcessInfo_st->nvmlProcessInfo_v2_st@XuehaiPan commented on GitHub (Aug 16, 2023):
Hi @cfroehli @GeekRaw, I created a new PR to resolve this. You could try:
Let me know if this works for you.
@cfroehli commented on GitHub (Aug 17, 2023):
That fixes the process listing in my case. (thanks for the quick fix and the nice tool btw)
@XuehaiPan commented on GitHub (Aug 17, 2023):
@cfroehli Thanks for the feedback. A new version with the fix will release soon.
@XuehaiPan commented on GitHub (Aug 24, 2023):
Hi, the NVIDIA driver upstream re-add the v3 APIs back in the last driver release:
535.98 -> 535.104.05:
74cae7fa6aRe-add process info v3 APIs but use v2
nvmlProcessInfo_ststruct typedefault:
nvmlDeviceGetComputeRunningProcesses->nvmlDeviceGetComputeRunningProcesses_v3nvmlProcessInfo_st->nvmlProcessInfo_v2_stnvitop 1.2.0 will work fine if you upgrade your NVIDIA driver to 535.104.05.