[GH-ISSUE #13] [Bug] gpu memory-usage not show right in driver 510 version #12

Closed
opened 2026-05-05 03:21:41 -06:00 by gitea-mirror · 14 comments
Owner

Originally created by @jue-jue-zi on GitHub (Mar 21, 2022).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/13

Runtime Environment

  • Operating system and version: Ubuntu 20.04 LTS
  • Terminal emulator and version: GNOME Terminal 3.36.2
  • Python version: 3.8.10
  • NVML version (driver version): 510.47.03
  • nvitop version or commit: 0.5.3
  • nvidia-ml-py version: 11.450.51
  • Locale: zh_CN.UTF-8

Current Behavior

After upgrade the nvidia driver to the latest version 510.47.03, the gpu memory-usage not show right in my workstation both for 1080Ti and A100. It shows more memory usage than the actual one, which is not matched with the nvidia-smi command.

nvitop

image

nvidia-smi

image

It seems the nvtop command also makes mistakes.

nvtop

image

Expected Behavior

The gpu memory-usage should match the nvidia-smi.

Originally created by @jue-jue-zi on GitHub (Mar 21, 2022). Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/13 <!-- Thank you for contributing to nvitop by opening this issue. Please check through this list, so you can be as helpful as possible: 1. Was this issue already reported? Please do a quick search. 2. Maybe the problem is solved in the current master branch already? Simply clone nvitop's git repository and run ./nvitop.py to find out. 3. Provide all the relevant information, as outlined in this template. Feel free to remove any sections you don't need. --> #### Runtime Environment - Operating system and version: Ubuntu 20.04 LTS - Terminal emulator and version: GNOME Terminal 3.36.2 - Python version: 3.8.10 - NVML version (driver version): 510.47.03 - `nvitop` version or commit: 0.5.3 - `nvidia-ml-py` version: 11.450.51 - Locale: zh_CN.UTF-8 #### Current Behavior After upgrade the nvidia driver to the latest version 510.47.03, the gpu memory-usage not show right in my workstation both for 1080Ti and A100. It shows more memory usage than the actual one, which is not matched with the `nvidia-smi` command. `nvitop` ![image](https://user-images.githubusercontent.com/26075785/159283272-586d31bd-acd0-4a68-a418-ad12e028257f.png) `nvidia-smi` ![image](https://user-images.githubusercontent.com/26075785/159283319-a216d1ab-eda3-42fc-b9cc-9b4dda2a2155.png) It seems the `nvtop` command also makes mistakes. `nvtop` ![image](https://user-images.githubusercontent.com/26075785/159283505-99317531-04d1-45d5-82fc-f262a9a376bb.png) #### Expected Behavior The gpu memory-usage should match the `nvidia-smi`.
gitea-mirror 2026-05-05 03:21:41 -06:00
Author
Owner

@XuehaiPan commented on GitHub (Mar 21, 2022):

@jue-jue-zi Hi, thanks for the feedback!

This is an internal issue related to the NVML shipped with the NVIDIA R510 driver. I think NVIDIA is pre-testing experimental APIs in nvidia-smi before the NVML counterparts are released.

Related issue: https://github.com/NVIDIA/go-nvml/issues/28#issuecomment-1067285988

nvmlDeviceGetMPSComputeRunningProcesses can be found in nvidia-smi with R450. But the API was never exposed until the R470 driver was released.


NVIDIA add a new v2 version of struct nvmlMemory_t with new API nvmlDeviceGetMemoryInfo_v2.

# in nvidia-ml-py==11.515.0

class c_nvmlMemory_t(_PrintableStructure):
    _fields_ = [
        ('total', c_ulonglong),
        ('free', c_ulonglong),
        ('used', c_ulonglong),
    ]
    _fmt_ = {'<default>': "%d B"}

class c_nvmlMemory_v2_t(_PrintableStructure):
    _fields_ = [
        ('version', c_uint),
        ('total', c_ulonglong),
        ('reserved', c_ulonglong),
        ('free', c_ulonglong),
        ('used', c_ulonglong),
    ]
    _fmt_ = {'<default>': "%d B"}

Two new fields version and reserved are added.

However, I cannot find API nvmlDeviceGetMemoryInfo_v2 in the R510 driver (on Ubuntu 20.04 LTS).

meminfov2

<!-- gh-comment-id:1074004860 --> @XuehaiPan commented on GitHub (Mar 21, 2022): @jue-jue-zi Hi, thanks for the feedback! This is an internal issue related to the NVML shipped with the NVIDIA R510 driver. I think NVIDIA is pre-testing experimental APIs in `nvidia-smi` before the NVML counterparts are released. Related issue: https://github.com/NVIDIA/go-nvml/issues/28#issuecomment-1067285988 `nvmlDeviceGetMPSComputeRunningProcesses` can be found in `nvidia-smi` with R450. But the API was never exposed until the R470 driver was released. ------ NVIDIA add a new `v2` version of struct `nvmlMemory_t` with new API `nvmlDeviceGetMemoryInfo_v2`. ```python # in nvidia-ml-py==11.515.0 class c_nvmlMemory_t(_PrintableStructure): _fields_ = [ ('total', c_ulonglong), ('free', c_ulonglong), ('used', c_ulonglong), ] _fmt_ = {'<default>': "%d B"} class c_nvmlMemory_v2_t(_PrintableStructure): _fields_ = [ ('version', c_uint), ('total', c_ulonglong), ('reserved', c_ulonglong), ('free', c_ulonglong), ('used', c_ulonglong), ] _fmt_ = {'<default>': "%d B"} ``` Two new fields `version` and `reserved` are added. However, I cannot find API `nvmlDeviceGetMemoryInfo_v2` in the R510 driver (on Ubuntu 20.04 LTS). ![meminfov2](https://user-images.githubusercontent.com/16078332/159288602-dc207899-cae8-4119-ba65-e89c154f042c.png)
Author
Owner

@XuehaiPan commented on GitHub (Mar 21, 2022):

I can get the "almost" correct result with:

memory_used_v2 = memory_used - bar1_memory_free

But I don't think this monkey patch is the right solution.

meminfo

<!-- gh-comment-id:1074022636 --> @XuehaiPan commented on GitHub (Mar 21, 2022): I can get the _"almost"_ correct result with: ```python memory_used_v2 = memory_used - bar1_memory_free ``` But I don't think this monkey patch is the right solution. ![meminfo](https://user-images.githubusercontent.com/16078332/159290984-3cf40bd1-7f77-44d3-9ea5-856eda5330a5.png)
Author
Owner

@jue-jue-zi commented on GitHub (Mar 21, 2022):

Thanks for your reply. I'm sorry that I'm not familiar with the relative libraries. Would it be fixed by updating the nvidia driver in the future or just by updating the libraries after a patched nvitop version release?

<!-- gh-comment-id:1074031963 --> @jue-jue-zi commented on GitHub (Mar 21, 2022): Thanks for your reply. I'm sorry that I'm not familiar with the relative libraries. Would it be fixed by updating the nvidia driver in the future or just by updating the libraries after a patched nvitop version release?
Author
Owner

@XuehaiPan commented on GitHub (Mar 21, 2022):

Would it be fixed by updating the nvidia driver in the future or just by updating the libraries after a patched nvitop version release?

Since our dependency nvidia-ml-py is pinned to 11.450.51, nvitop will always use the v1 version of struct nvmlMemory_t. We will need to both upgrade the NVIDIA driver (e.g. the R530 driver in the future) and the pinned dependency nvidia-ml-py.

<!-- gh-comment-id:1074049664 --> @XuehaiPan commented on GitHub (Mar 21, 2022): > Would it be fixed by updating the nvidia driver in the future or just by updating the libraries after a patched nvitop version release? Since our dependency `nvidia-ml-py` is pinned to `11.450.51`, `nvitop` will always use the `v1` version of struct `nvmlMemory_t`. We will need to both upgrade the NVIDIA driver (e.g. the R530 driver in the future) and the pinned dependency `nvidia-ml-py`.
Author
Owner

@XuehaiPan commented on GitHub (Oct 17, 2022):

This issue is fixed by #30. Please upgrade your nvitop and nvidia-ml-py by:

pip3 install --upgrade nvitop nvidia-ml-py
<!-- gh-comment-id:1280630152 --> @XuehaiPan commented on GitHub (Oct 17, 2022): This issue is fixed by #30. Please upgrade your `nvitop` and `nvidia-ml-py` by: ```bash pip3 install --upgrade nvitop nvidia-ml-py ```
Author
Owner

@jue-jue-zi commented on GitHub (Oct 17, 2022):

Hi, I updated the nvitop to version 0.10.0, the 1080Ti GPUs with driver 515.65.01 were failed to run nvitop,

Traceback (most recent call last):
  File "/usr/local/bin/nvitop", line 5, in <module>
    from nvitop.cli import main
  File "/usr/local/lib/python3.8/dist-packages/nvitop/__init__.py", line 6, in <module>
    from nvitop import core
  File "/usr/local/lib/python3.8/dist-packages/nvitop/core/__init__.py", line 6, in <module>
    from nvitop.core import host, libcuda, libnvml, utils
  File "/usr/local/lib/python3.8/dist-packages/nvitop/core/libnvml.py", line 543, in <module>
    __patch_backward_compatibility_layers()
  File "/usr/local/lib/python3.8/dist-packages/nvitop/core/libnvml.py", line 539, in __patch_backward_compatibility_layers
    with_mapped_function_name()  # patch first and only for once
  File "/usr/local/lib/python3.8/dist-packages/nvitop/core/libnvml.py", line 443, in with_mapped_function_name
    _pynvml._nvmlGetFunctionPointer  # pylint: disable=protected-access
AttributeError: module 'pynvml' has no attribute '_nvmlGetFunctionPointer'

All packages have been updated using pip3 install --upgrade nvitop nvidia-ml-py,

Requirement already satisfied: nvitop in /usr/local/lib/python3.8/dist-packages (0.10.0)
Requirement already satisfied: nvidia-ml-py in /usr/local/lib/python3.8/dist-packages (11.515.75)
Requirement already satisfied: psutil>=5.6.6 in /usr/local/lib/python3.8/dist-packages (from nvitop) (5.8.0)
Requirement already satisfied: cachetools>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from nvitop) (4.2.2)
Requirement already satisfied: termcolor>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from nvitop) (1.1.0)

nvidia-smi:

Mon Oct 17 19:05:04 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   34C    P8    15W / 250W |      2MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0B:00.0 Off |                  N/A |
|  0%   34C    P8     9W / 250W |      2MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:0C:00.0 Off |                  N/A |
|  0%   30C    P8     8W / 250W |      2MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:1B:00.0 Off |                  N/A |
|  0%   33C    P8    10W / 250W |      2MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
<!-- gh-comment-id:1280681684 --> @jue-jue-zi commented on GitHub (Oct 17, 2022): Hi, I updated the nvitop to version 0.10.0, the 1080Ti GPUs with driver 515.65.01 were failed to run `nvitop`, ```log Traceback (most recent call last): File "/usr/local/bin/nvitop", line 5, in <module> from nvitop.cli import main File "/usr/local/lib/python3.8/dist-packages/nvitop/__init__.py", line 6, in <module> from nvitop import core File "/usr/local/lib/python3.8/dist-packages/nvitop/core/__init__.py", line 6, in <module> from nvitop.core import host, libcuda, libnvml, utils File "/usr/local/lib/python3.8/dist-packages/nvitop/core/libnvml.py", line 543, in <module> __patch_backward_compatibility_layers() File "/usr/local/lib/python3.8/dist-packages/nvitop/core/libnvml.py", line 539, in __patch_backward_compatibility_layers with_mapped_function_name() # patch first and only for once File "/usr/local/lib/python3.8/dist-packages/nvitop/core/libnvml.py", line 443, in with_mapped_function_name _pynvml._nvmlGetFunctionPointer # pylint: disable=protected-access AttributeError: module 'pynvml' has no attribute '_nvmlGetFunctionPointer' ``` All packages have been updated using `pip3 install --upgrade nvitop nvidia-ml-py`, ```log Requirement already satisfied: nvitop in /usr/local/lib/python3.8/dist-packages (0.10.0) Requirement already satisfied: nvidia-ml-py in /usr/local/lib/python3.8/dist-packages (11.515.75) Requirement already satisfied: psutil>=5.6.6 in /usr/local/lib/python3.8/dist-packages (from nvitop) (5.8.0) Requirement already satisfied: cachetools>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from nvitop) (4.2.2) Requirement already satisfied: termcolor>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from nvitop) (1.1.0) ``` `nvidia-smi`: ```log Mon Oct 17 19:05:04 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:03:00.0 Off | N/A | | 0% 34C P8 15W / 250W | 2MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... Off | 00000000:0B:00.0 Off | N/A | | 0% 34C P8 9W / 250W | 2MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce ... Off | 00000000:0C:00.0 Off | N/A | | 0% 30C P8 8W / 250W | 2MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce ... Off | 00000000:1B:00.0 Off | N/A | | 0% 33C P8 10W / 250W | 2MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ```
Author
Owner

@XuehaiPan commented on GitHub (Oct 17, 2022):

@jue-jue-zi Can you try?

pip3 install --force-reinstall nvitop nvidia-ml-py
<!-- gh-comment-id:1280770328 --> @XuehaiPan commented on GitHub (Oct 17, 2022): @jue-jue-zi Can you try? ```bash pip3 install --force-reinstall nvitop nvidia-ml-py ```
Author
Owner

@jue-jue-zi commented on GitHub (Oct 17, 2022):

It did not work. And it seems something strange that it runs normally when installed by a non-root user. But it occurs the same errors when installed by the root user.

<!-- gh-comment-id:1280786387 --> @jue-jue-zi commented on GitHub (Oct 17, 2022): It did not work. And it seems something strange that it runs normally when installed by a non-root user. But it occurs the same errors when installed by the root user.
Author
Owner

@jue-jue-zi commented on GitHub (Oct 17, 2022):

It did not work. And it seems something strange that it runs normally when installed by a non-root user. But it occurs the same errors when installed by the root user.

And I found that it also failed to run after I uninstall nvidia-ml-py3 package of the non-root user. However, it still not works even I reinstall nvidia-ml-py3 package using pip3 install nvidia-ml-py3==7.352.0.

<!-- gh-comment-id:1280793893 --> @jue-jue-zi commented on GitHub (Oct 17, 2022): > It did not work. And it seems something strange that it runs normally when installed by a non-root user. But it occurs the same errors when installed by the root user. And I found that it also failed to run after I uninstall `nvidia-ml-py3` package of the non-root user. However, it still not works even I reinstall `nvidia-ml-py3` package using `pip3 install nvidia-ml-py3==7.352.0`.
Author
Owner

@XuehaiPan commented on GitHub (Oct 17, 2022):

But it occurs the same errors when installed by the root user.

You should use the same Python interpreter to run pip install and nvitop.

For admin:

sudo /usr/bin/python3 -m pip install --force-install nvitop nvidia-ml-py

For normal user:

pip3 install --user --force-install nvitop nvidia-ml-py

and add ~/.local/bin to your PATH.

<!-- gh-comment-id:1280794275 --> @XuehaiPan commented on GitHub (Oct 17, 2022): > But it occurs the same errors when installed by the root user. You should use the same Python interpreter to run `pip install` and `nvitop`. For admin: ```bash sudo /usr/bin/python3 -m pip install --force-install nvitop nvidia-ml-py ``` For normal user: ```bash pip3 install --user --force-install nvitop nvidia-ml-py ``` and add `~/.local/bin` to your `PATH`.
Author
Owner

@jue-jue-zi commented on GitHub (Oct 17, 2022):

The normal user command: pip3 install --user --force-reinstall nvitop nvidia-ml-py works, but the command for admin user not works. I actually use sudo -i to switch the root user and run that command.

<!-- gh-comment-id:1280800797 --> @jue-jue-zi commented on GitHub (Oct 17, 2022): The normal user command: `pip3 install --user --force-reinstall nvitop nvidia-ml-py` works, but the command for admin user not works. I actually use `sudo -i` to switch the root user and run that command.
Author
Owner

@XuehaiPan commented on GitHub (Oct 17, 2022):

And I found that it also failed to run after I uninstall nvidia-ml-py3 package of the non-root user. However, it still not works even I reinstall nvidia-ml-py3 package using pip3 install nvidia-ml-py3==7.352.0.

That's why the issue

Hi, I updated the nvitop to version 0.10.0, the 1080Ti GPUs with driver 515.65.01 were failed to run nvitop,

occurs.

Both nvidia-ml-py and nvidia-ml-py3 install module pynvml.py. So they are mutually in conflict with each other. You should uninstall nvidia-ml-py3 and force reinstall nvidia-ml-py. Otherwise, please install nvitop in a clean virtual environment (do not install nvidia-ml-py3 and pynvml). Then everything will work as expected.

<!-- gh-comment-id:1280802958 --> @XuehaiPan commented on GitHub (Oct 17, 2022): > And I found that it also failed to run after I uninstall `nvidia-ml-py3` package of the non-root user. However, it still not works even I reinstall `nvidia-ml-py3` package using `pip3 install nvidia-ml-py3==7.352.0`. That's why the issue > Hi, I updated the nvitop to version 0.10.0, the 1080Ti GPUs with driver 515.65.01 were failed to run `nvitop`, occurs. Both [`nvidia-ml-py`](https://pypi.org/project/nvidia-ml-py) and [`nvidia-ml-py3`](https://pypi.org/project/nvidia-ml-py3) install module `pynvml.py`. So they are mutually in conflict with each other. You should uninstall `nvidia-ml-py3` and force reinstall `nvidia-ml-py`. Otherwise, please install `nvitop` in a clean virtual environment (do not install [`nvidia-ml-py3`](https://pypi.org/project/nvidia-ml-py3) and [`pynvml`](https://pypi.org/project/pynvml)). Then everything will work as expected.
Author
Owner

@jue-jue-zi commented on GitHub (Oct 17, 2022):

And I found that it also failed to run after I uninstall nvidia-ml-py3 package of the non-root user. However, it still not works even I reinstall nvidia-ml-py3 package using pip3 install nvidia-ml-py3==7.352.0.

That's why the issue

Hi, I updated the nvitop to version 0.10.0, the 1080Ti GPUs with driver 515.65.01 were failed to run nvitop,

occurs.

Both nvidia-ml-py and nvidia-ml-py3 install module pynvml.py. So they are mutually in conflict with each other. You should uninstall nvidia-ml-py3 and force reinstall nvidia-ml-py. Otherwise, please install nvitop in a clean virtual environment (do not install nvidia-ml-py3 and pynvml). Then everything will work as expected.

I created a virtual env using python3 -m venv venv and installed nvitop using the root user, it works! I will check the installed packages and find out the reasons, thanks for helping!

<!-- gh-comment-id:1280806610 --> @jue-jue-zi commented on GitHub (Oct 17, 2022): > > And I found that it also failed to run after I uninstall `nvidia-ml-py3` package of the non-root user. However, it still not works even I reinstall `nvidia-ml-py3` package using `pip3 install nvidia-ml-py3==7.352.0`. > > That's why the issue > > > Hi, I updated the nvitop to version 0.10.0, the 1080Ti GPUs with driver 515.65.01 were failed to run `nvitop`, > > occurs. > > Both [`nvidia-ml-py`](https://pypi.org/project/nvidia-ml-py) and [`nvidia-ml-py3`](https://pypi.org/project/nvidia-ml-py3) install module `pynvml.py`. So they are mutually in conflict with each other. You should uninstall `nvidia-ml-py3` and force reinstall `nvidia-ml-py`. Otherwise, please install `nvitop` in a clean virtual environment (do not install [`nvidia-ml-py3`](https://pypi.org/project/nvidia-ml-py3) and [`pynvml`](https://pypi.org/project/pynvml)). Then everything will work as expected. I created a virtual env using `python3 -m venv venv` and installed `nvitop` using the root user, it works! I will check the installed packages and find out the reasons, thanks for helping!
Author
Owner

@jue-jue-zi commented on GitHub (Oct 17, 2022):

It found out that the root user installed the nvgpu package, which requires pynvml package. And the pynvml package made the nvitop use the wrong package. All work after uninstalling the pynvml package.

<!-- gh-comment-id:1280816711 --> @jue-jue-zi commented on GitHub (Oct 17, 2022): It found out that the root user installed the `nvgpu` package, which requires `pynvml` package. And the `pynvml` package made the `nvitop` use the wrong package. All work after uninstalling the `pynvml` package.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#12
No description provided.