mirror of
https://github.com/XuehaiPan/nvitop.git
synced 2026-05-15 14:15:55 -06:00
[GH-ISSUE #13] [Bug] gpu memory-usage not show right in driver 510 version #12
Labels
No labels
api
bug
bug
cli / tui
dependencies
documentation
documentation
documentation
duplicate
enhancement
exporter
invalid
pull-request
pynvml
question
question
upstream
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/nvitop#12
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jue-jue-zi on GitHub (Mar 21, 2022).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/13
Runtime Environment
nvitopversion or commit: 0.5.3nvidia-ml-pyversion: 11.450.51Current Behavior
After upgrade the nvidia driver to the latest version 510.47.03, the gpu memory-usage not show right in my workstation both for 1080Ti and A100. It shows more memory usage than the actual one, which is not matched with the
nvidia-smicommand.nvitopnvidia-smiIt seems the
nvtopcommand also makes mistakes.nvtopExpected Behavior
The gpu memory-usage should match the
nvidia-smi.@XuehaiPan commented on GitHub (Mar 21, 2022):
@jue-jue-zi Hi, thanks for the feedback!
This is an internal issue related to the NVML shipped with the NVIDIA R510 driver. I think NVIDIA is pre-testing experimental APIs in
nvidia-smibefore the NVML counterparts are released.Related issue: https://github.com/NVIDIA/go-nvml/issues/28#issuecomment-1067285988
nvmlDeviceGetMPSComputeRunningProcessescan be found innvidia-smiwith R450. But the API was never exposed until the R470 driver was released.NVIDIA add a new
v2version of structnvmlMemory_twith new APInvmlDeviceGetMemoryInfo_v2.Two new fields
versionandreservedare added.However, I cannot find API
nvmlDeviceGetMemoryInfo_v2in the R510 driver (on Ubuntu 20.04 LTS).@XuehaiPan commented on GitHub (Mar 21, 2022):
I can get the "almost" correct result with:
But I don't think this monkey patch is the right solution.
@jue-jue-zi commented on GitHub (Mar 21, 2022):
Thanks for your reply. I'm sorry that I'm not familiar with the relative libraries. Would it be fixed by updating the nvidia driver in the future or just by updating the libraries after a patched nvitop version release?
@XuehaiPan commented on GitHub (Mar 21, 2022):
Since our dependency
nvidia-ml-pyis pinned to11.450.51,nvitopwill always use thev1version of structnvmlMemory_t. We will need to both upgrade the NVIDIA driver (e.g. the R530 driver in the future) and the pinned dependencynvidia-ml-py.@XuehaiPan commented on GitHub (Oct 17, 2022):
This issue is fixed by #30. Please upgrade your
nvitopandnvidia-ml-pyby:@jue-jue-zi commented on GitHub (Oct 17, 2022):
Hi, I updated the nvitop to version 0.10.0, the 1080Ti GPUs with driver 515.65.01 were failed to run
nvitop,All packages have been updated using
pip3 install --upgrade nvitop nvidia-ml-py,nvidia-smi:@XuehaiPan commented on GitHub (Oct 17, 2022):
@jue-jue-zi Can you try?
@jue-jue-zi commented on GitHub (Oct 17, 2022):
It did not work. And it seems something strange that it runs normally when installed by a non-root user. But it occurs the same errors when installed by the root user.
@jue-jue-zi commented on GitHub (Oct 17, 2022):
And I found that it also failed to run after I uninstall
nvidia-ml-py3package of the non-root user. However, it still not works even I reinstallnvidia-ml-py3package usingpip3 install nvidia-ml-py3==7.352.0.@XuehaiPan commented on GitHub (Oct 17, 2022):
You should use the same Python interpreter to run
pip installandnvitop.For admin:
For normal user:
and add
~/.local/binto yourPATH.@jue-jue-zi commented on GitHub (Oct 17, 2022):
The normal user command:
pip3 install --user --force-reinstall nvitop nvidia-ml-pyworks, but the command for admin user not works. I actually usesudo -ito switch the root user and run that command.@XuehaiPan commented on GitHub (Oct 17, 2022):
That's why the issue
occurs.
Both
nvidia-ml-pyandnvidia-ml-py3install modulepynvml.py. So they are mutually in conflict with each other. You should uninstallnvidia-ml-py3and force reinstallnvidia-ml-py. Otherwise, please installnvitopin a clean virtual environment (do not installnvidia-ml-py3andpynvml). Then everything will work as expected.@jue-jue-zi commented on GitHub (Oct 17, 2022):
I created a virtual env using
python3 -m venv venvand installednvitopusing the root user, it works! I will check the installed packages and find out the reasons, thanks for helping!@jue-jue-zi commented on GitHub (Oct 17, 2022):
It found out that the root user installed the
nvgpupackage, which requirespynvmlpackage. And thepynvmlpackage made thenvitopuse the wrong package. All work after uninstalling thepynvmlpackage.