[GH-ISSUE #193] [BUG] NVML cannot get device memory info for NVIDIA DGX Spark due to unified memory #117

Closed
opened 2026-05-05 03:26:01 -06:00 by gitea-mirror · 15 comments
Owner

Originally created by @FlorinAndrei on GitHub (Nov 20, 2025).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/193

Originally assigned to: @XuehaiPan on GitHub.

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

1.6.0

Operating system and version

Ubuntu 24.04 LTS

NVIDIA driver version

580.95.05

NVIDIA-SMI

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GB10                    On  |   0000000F:01:00.0 Off |                  N/A |
| N/A   73C    P0             61W /  N/A  | Not Supported          |     89%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            6920      C   python                                36593MiB |
+-----------------------------------------------------------------------------------------+

Python environment

3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] linux
nvidia-ml-py==13.580.82
nvitop==1.6.0

Problem description

On the NVIDIA DGX Spark, the app seems to work fine, except for memory usage.

Spark has unified memory, which is a little unusual. nvidia-smi itself has issues with reporting memory.

I just wanted to raise awareness on this issue. Feel free to convert this to a discussion instead.

Steps to Reproduce

The Python snippets (if any):


Command lines:

nvitop

Traceback


Logs


Expected behavior

No response

Additional context

No response

Originally created by @FlorinAndrei on GitHub (Nov 20, 2025). Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/193 Originally assigned to: @XuehaiPan on GitHub. ### Required prerequisites - [x] I have read the documentation <https://nvitop.readthedocs.io>. - [x] I have searched the [Issue Tracker](https://github.com/XuehaiPan/nvitop/issues) that this hasn't already been reported. (comment there if it has.) - [x] I have tried the latest version of nvitop in a new isolated virtual environment. ### What version of nvitop are you using? 1.6.0 ### Operating system and version Ubuntu 24.04 LTS ### NVIDIA driver version 580.95.05 ### NVIDIA-SMI ```text +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GB10 On | 0000000F:01:00.0 Off | N/A | | N/A 73C P0 61W / N/A | Not Supported | 89% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 6920 C python 36593MiB | +-----------------------------------------------------------------------------------------+ ``` ### Python environment 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] linux nvidia-ml-py==13.580.82 nvitop==1.6.0 ### Problem description On the NVIDIA DGX Spark, the app seems to work fine, except for memory usage. Spark has unified memory, which is a little unusual. `nvidia-smi` itself has issues with reporting memory. I just wanted to raise awareness on this issue. Feel free to convert this to a discussion instead. ### Steps to Reproduce The Python snippets (if any): ```python ``` Command lines: ```bash nvitop ``` ### Traceback ```pytb ``` ### Logs ```text ``` ### Expected behavior _No response_ ### Additional context _No response_
gitea-mirror 2026-05-05 03:26:01 -06:00
Author
Owner

@FlorinAndrei commented on GitHub (Nov 20, 2025):

I should add: system memory does seem to be reported correctly. And that's all the memory this system has, because this is unified memory, like on a Mac. It's just the GPU MEM that's shown as N/A.

I'm not sure what's the best strategy here.

One idea would be to copy system memory metrics to the GPU memory graphs, just duplicate that info.

Another idea is something I heard nvtop may have in the code now, but has not made a release yet: I heard now they take total system memory, subtract from it the memory used by CPU-only processes, and that difference becomes the "total memory" available to GPU processes. So on the Spark, the total GPU memory becomes variable. And then they do the total mem usage for GPU processes, and display that, relative to the new "total GPU memory".

The unified memory paradigm changed everything for these tools. Assumptions made in the past are no longer always valid now.

<!-- gh-comment-id:3559784502 --> @FlorinAndrei commented on GitHub (Nov 20, 2025): I should add: system memory does seem to be reported correctly. And that's all the memory this system has, because this is unified memory, like on a Mac. It's just the GPU MEM that's shown as N/A. I'm not sure what's the best strategy here. One idea would be to copy system memory metrics to the GPU memory graphs, just duplicate that info. Another idea is something I heard `nvtop` may have in the code now, but has not made a release yet: I heard now they take total system memory, subtract from it the memory used by CPU-only processes, and that difference becomes the "total memory" available to GPU processes. So on the Spark, the total GPU memory becomes variable. And then they do the total mem usage for GPU processes, and display that, relative to the new "total GPU memory". The unified memory paradigm changed everything for these tools. Assumptions made in the past are no longer always valid now.
Author
Owner

@MaxwellDPS commented on GitHub (Nov 29, 2025):

Was fixed on NVTOP like so if this helps https://github.com/Syllo/nvtop/pull/411/files

<!-- gh-comment-id:3592007731 --> @MaxwellDPS commented on GitHub (Nov 29, 2025): Was fixed on NVTOP like so if this helps https://github.com/Syllo/nvtop/pull/411/files
Author
Owner

@thewh1teagle commented on GitHub (Nov 30, 2025):

I experience the same issue with DGX Spark. how can I install the new version with the new patch? thanks.

<!-- gh-comment-id:3593920656 --> @thewh1teagle commented on GitHub (Nov 30, 2025): I experience the same issue with DGX Spark. how can I install the new version with the new patch? thanks.
Author
Owner

@MaxwellDPS commented on GitHub (Dec 1, 2025):

@XuehaiPan Got some changes on this fork that fixes it by calculating based off the total memory, but kind of messy still

https://github.com/MaxwellDPS/nvitop
Exporter image is built and working ghcr.io/cha0s-corp/nvitop-exporter:latest

<!-- gh-comment-id:3594100704 --> @MaxwellDPS commented on GitHub (Dec 1, 2025): @XuehaiPan Got some changes on this fork that fixes it by calculating based off the total memory, but kind of messy still https://github.com/MaxwellDPS/nvitop Exporter image is built and working ghcr.io/cha0s-corp/nvitop-exporter:latest
Author
Owner

@XuehaiPan commented on GitHub (Dec 1, 2025):

Hi everyone, thanks for the information! Before we patch for unified memory support, I want to ensure some details.

  1. device.memory_total(): Instead of having a dynamic total GPU memory by subtracting the memory used by CPU processes, I'd like to show the fixed total system memory instead, i.e., host.virtual_memory().total.

  2. device.memory_used(): There can be two approaches:

    1. Sum up the used GPU memory of running GPU processes.
    2. Show the used memory (by GPU or CPU processes), i.e., host.virtual_memory().used.

    I prefer the second approach because we can get the available memory with a simple subtraction available = total - used.

  3. I need some more information to distinguish if the GPU is lost or if it is a Spark device that uses unified memory. I'm wondering if someone could help to run the following snippet in Python REPL:

    >>> from nvitop import Device, libnvml
    >>> d = Device(0)
    >>> d.memory_info()
    >>> str(libnvml.nvmlQuery('nvmlDeviceGetMemoryInfo', d._handle))
    

    If I understand correctly, for unified memory devices, nvmlDeviceGetMemoryInfo(handle) will return NVML_SUCCESS with memory_info.total == 0. Is that right?


I implemented the above patch in PR #195 based on my current understanding of the unified memory. You can try it via:

uvx --from git+https://github.com/XuehaiPan/nvitop.git@support-unified-memory nvitop

I haven't tested it because I do not have the resources. You are welcome to share any console output or screenshot for the patch. Thanks in advance!

<!-- gh-comment-id:3594415847 --> @XuehaiPan commented on GitHub (Dec 1, 2025): Hi everyone, thanks for the information! Before we patch for unified memory support, I want to ensure some details. 1. `device.memory_total()`: Instead of having a dynamic total GPU memory by subtracting the memory used by CPU processes, I'd like to show the fixed total system memory instead, i.e., `host.virtual_memory().total`. 2. `device.memory_used()`: There can be two approaches: 1. Sum up the used GPU memory of running GPU processes. 2. Show the used memory (by GPU or CPU processes), i.e., `host.virtual_memory().used`. I prefer the second approach because we can get the available memory with a simple subtraction `available = total - used`. 3. I need some more information to distinguish if the GPU is lost or if it is a Spark device that uses unified memory. I'm wondering if someone could help to run the following snippet in Python REPL: ```python >>> from nvitop import Device, libnvml >>> d = Device(0) >>> d.memory_info() >>> str(libnvml.nvmlQuery('nvmlDeviceGetMemoryInfo', d._handle)) ``` ~~If I understand correctly, for unified memory devices, `nvmlDeviceGetMemoryInfo(handle)` will return `NVML_SUCCESS` with `memory_info.total == 0`. Is that right?~~ ------ I implemented the above patch in PR #195 based on my current understanding of the unified memory. You can try it via: ```bash uvx --from git+https://github.com/XuehaiPan/nvitop.git@support-unified-memory nvitop ``` I haven't tested it because I do not have the resources. You are welcome to share any console output or screenshot for the patch. Thanks in advance!
Author
Owner

@FlorinAndrei commented on GitHub (Dec 1, 2025):

(.venv) florin@spark:~$ python
Python 3.12.3 (main, Nov  6 2025, 13:44:16) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from nvitop import Device, libnvml
>>> d = Device(0)
>>> d.memory_info()
MemoryInfo(total='N/A', free='N/A', used='N/A')
>>> str(libnvml.nvmlQuery('nvmlDeviceGetMemoryInfo', d._handle))
'N/A'
>>>

I've tried Device(1), etc but I get errors.

<!-- gh-comment-id:3597911469 --> @FlorinAndrei commented on GitHub (Dec 1, 2025): ``` (.venv) florin@spark:~$ python Python 3.12.3 (main, Nov 6 2025, 13:44:16) [GCC 13.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from nvitop import Device, libnvml >>> d = Device(0) >>> d.memory_info() MemoryInfo(total='N/A', free='N/A', used='N/A') >>> str(libnvml.nvmlQuery('nvmlDeviceGetMemoryInfo', d._handle)) 'N/A' >>> ``` I've tried `Device(1)`, etc but I get errors.
Author
Owner

@FlorinAndrei commented on GitHub (Dec 1, 2025):

This is with your unified memory branch:

Image

This is with the latest nvitop release:

Image

This is htop:

Image

This is nvidia-smi:

Image

Let me know what else I should run.

<!-- gh-comment-id:3598049177 --> @FlorinAndrei commented on GitHub (Dec 1, 2025): This is with your unified memory branch: <img width="1920" height="1041" alt="Image" src="https://github.com/user-attachments/assets/a555b54e-31bb-4069-adf3-786fb26467df" /> This is with the latest nvitop release: <img width="1920" height="1041" alt="Image" src="https://github.com/user-attachments/assets/de0a851f-328b-4357-9ad8-1c420ce24628" /> This is htop: <img width="1920" height="1041" alt="Image" src="https://github.com/user-attachments/assets/1a291638-4f16-442e-bc92-0820a7919440" /> This is nvidia-smi: <img width="1920" height="1041" alt="Image" src="https://github.com/user-attachments/assets/892e15b5-59a9-4edd-b58d-6c647cd64b83" /> Let me know what else I should run.
Author
Owner

@XuehaiPan commented on GitHub (Dec 2, 2025):

3. If I understand correctly, for unified memory devices, nvmlDeviceGetMemoryInfo(handle) will return NVML_SUCCESS with memory_info.total == 0. Is that right?

Based on the output in https://github.com/XuehaiPan/nvitop/issues/193#issuecomment-3597911469

(.venv) florin@spark:~$ python
Python 3.12.3 (main, Nov  6 2025, 13:44:16) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from nvitop import Device, libnvml
>>> d = Device(0)
>>> d.memory_info()
MemoryInfo(total='N/A', free='N/A', used='N/A')
>>> str(libnvml.nvmlQuery('nvmlDeviceGetMemoryInfo', d._handle))
'N/A'

@FlorinAndrei Thanks for the information! Seems this assumption is not always hold for unified memory devices (where NVML_ERROR_NOT_SUPPORTED is returned). I have updated the implementation in PR #195. Hope that works.

uvx --from git+https://github.com/XuehaiPan/nvitop.git@support-unified-memory nvitop
<!-- gh-comment-id:3600264646 --> @XuehaiPan commented on GitHub (Dec 2, 2025): > ~~3\. If I understand correctly, for unified memory devices, `nvmlDeviceGetMemoryInfo(handle)` will return `NVML_SUCCESS` with `memory_info.total == 0`. Is that right?~~ Based on the output in https://github.com/XuehaiPan/nvitop/issues/193#issuecomment-3597911469 > ```python > (.venv) florin@spark:~$ python > Python 3.12.3 (main, Nov 6 2025, 13:44:16) [GCC 13.3.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> from nvitop import Device, libnvml > >>> d = Device(0) > >>> d.memory_info() > MemoryInfo(total='N/A', free='N/A', used='N/A') > >>> str(libnvml.nvmlQuery('nvmlDeviceGetMemoryInfo', d._handle)) > 'N/A' > ``` @FlorinAndrei Thanks for the information! Seems this assumption is not always hold for unified memory devices (where `NVML_ERROR_NOT_SUPPORTED` is returned). I have updated the implementation in PR #195. Hope that works. ```bash uvx --from git+https://github.com/XuehaiPan/nvitop.git@support-unified-memory nvitop ```
Author
Owner

@FlorinAndrei commented on GitHub (Dec 2, 2025):

florin@spark:~$ uvx --from git+https://github.com/XuehaiPan/nvitop.git@support-unified-memory nvitop
Traceback (most recent call last):
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/utils.py", line 757, in wrapped
    ret = self._cache[method]  # type: ignore[attr-defined]
          ~~~~~~~~~~~^^^^^^^^
KeyError: <function Device.memory_info at 0xfaca1b5676a0>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/bin/nvitop", line 12, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/cli.py", line 352, in main
    tui = TUI(
          ^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/tui.py", line 68, in __init__
    self.main_screen: MainScreen = MainScreen(
                                   ^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/screens/main/__init__.py", line 67, in __init__
    self.device_panel: DevicePanel = DevicePanel(
                                     ^^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/screens/main/panels/device.py", line 88, in __init__
    self.snapshots: list[Snapshot] = self.take_snapshots()
                                     ^^^^^^^^^^^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/caching.py", line 220, in wrapped
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/screens/main/panels/device.py", line 169, in take_snapshots
    snapshots = [device.as_snapshot() for device in self.all_devices]
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/screens/main/panels/device.py", line 169, in <listcomp>
    snapshots = [device.as_snapshot() for device in self.all_devices]
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/library/device.py", line 93, in as_snapshot
    self._snapshot = super().as_snapshot()
                     ^^^^^^^^^^^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/device.py", line 2371, in as_snapshot
    **{key: getattr(self, key)() for key in self.SNAPSHOT_KEYS},
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/device.py", line 2371, in <dictcomp>
    **{key: getattr(self, key)() for key in self.SNAPSHOT_KEYS},
            ^^^^^^^^^^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/device.py", line 1029, in memory_used
    return self.memory_info().used
           ^^^^^^^^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/utils.py", line 764, in wrapped
    ret = method(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/device.py", line 985, in memory_info
    if libnvml.nvmlCheckReturn(memory_info):
                               ^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'memory_info' where it is not associated with a value
<!-- gh-comment-id:3600441297 --> @FlorinAndrei commented on GitHub (Dec 2, 2025): ``` florin@spark:~$ uvx --from git+https://github.com/XuehaiPan/nvitop.git@support-unified-memory nvitop Traceback (most recent call last): File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/utils.py", line 757, in wrapped ret = self._cache[method] # type: ignore[attr-defined] ~~~~~~~~~~~^^^^^^^^ KeyError: <function Device.memory_info at 0xfaca1b5676a0> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/bin/nvitop", line 12, in <module> sys.exit(main()) ^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/cli.py", line 352, in main tui = TUI( ^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/tui.py", line 68, in __init__ self.main_screen: MainScreen = MainScreen( ^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/screens/main/__init__.py", line 67, in __init__ self.device_panel: DevicePanel = DevicePanel( ^^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/screens/main/panels/device.py", line 88, in __init__ self.snapshots: list[Snapshot] = self.take_snapshots() ^^^^^^^^^^^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/caching.py", line 220, in wrapped result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/screens/main/panels/device.py", line 169, in take_snapshots snapshots = [device.as_snapshot() for device in self.all_devices] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/screens/main/panels/device.py", line 169, in <listcomp> snapshots = [device.as_snapshot() for device in self.all_devices] ^^^^^^^^^^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/tui/library/device.py", line 93, in as_snapshot self._snapshot = super().as_snapshot() ^^^^^^^^^^^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/device.py", line 2371, in as_snapshot **{key: getattr(self, key)() for key in self.SNAPSHOT_KEYS}, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/device.py", line 2371, in <dictcomp> **{key: getattr(self, key)() for key in self.SNAPSHOT_KEYS}, ^^^^^^^^^^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/device.py", line 1029, in memory_used return self.memory_info().used ^^^^^^^^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/utils.py", line 764, in wrapped ret = method(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/florin/.cache/uv/archive-v0/LaixKknZP9BBzx-HSz1iI/lib/python3.11/site-packages/nvitop/api/device.py", line 985, in memory_info if libnvml.nvmlCheckReturn(memory_info): ^^^^^^^^^^^ UnboundLocalError: cannot access local variable 'memory_info' where it is not associated with a value ```
Author
Owner

@XuehaiPan commented on GitHub (Dec 2, 2025):

@FlorinAndrei Updated.

<!-- gh-comment-id:3600449008 --> @XuehaiPan commented on GitHub (Dec 2, 2025): @FlorinAndrei Updated.
Author
Owner

@FlorinAndrei commented on GitHub (Dec 2, 2025):

nvitop new branch:

Image

htop:

Image

nvidia-smi:

Image
<!-- gh-comment-id:3600471581 --> @FlorinAndrei commented on GitHub (Dec 2, 2025): nvitop new branch: <img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/0108d8fc-ad5a-4984-944a-c0de993d8b13" /> htop: <img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/b28fa0d4-0a87-452b-8022-277f35b474d0" /> nvidia-smi: <img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/b93d9372-c72c-4d4c-86fd-86ea24647df3" />
Author
Owner

@XuehaiPan commented on GitHub (Dec 2, 2025):

Thanks @FlorinAndrei. Looks good so far, except the memory bandwidth utilization might always return 0. I would like to hold the PR for several days to gather more feedback.

<!-- gh-comment-id:3600504462 --> @XuehaiPan commented on GitHub (Dec 2, 2025): Thanks @FlorinAndrei. Looks good so far, except the memory bandwidth utilization might always return `0`. I would like to hold the PR for several days to gather more feedback.
Author
Owner

@FlorinAndrei commented on GitHub (Dec 2, 2025):

@XuehaiPan thanks a lot for your excellent work!

Regarding that one field that shows N/A, it seems like it's not memory bandwidth, but rather the memory clock. To get the actual bandwidth from the clock is a bit tricky and seems to depend on the GPU (bus width, various multipliers, etc).

I think that field should be renamed in the app, to show that it's actually a clock frequency: the frequency of the memory clock. It's measured in MHz, which is a unit for frequency. Call it MCK or something.

Regardless, on the Spark the unified memory always operates at a fixed clock, so nvidia-smi does not report it. Here are some outputs from various tools on the Spark:

$ nvidia-smi -q -d CLOCK

==============NVSMI LOG==============

Timestamp                                 : Tue Dec  2 11:40:40 2025
Driver Version                            : 580.95.05
CUDA Version                              : 13.0

Attached GPUs                             : 1
GPU 0000000F:01:00.0
    Clocks
        Graphics                          : 2502 MHz
        SM                                : 2502 MHz
        Memory                            : N/A
        Video                             : 2158 MHz
    Applications Clocks
        Graphics                          : 2418 MHz
        Memory                            : N/A
    Default Applications Clocks
        Graphics                          : 2418 MHz
        Memory                            : N/A
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 3003 MHz
        SM                                : 3003 MHz
        Memory                            : N/A
        Video                             : 3003 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    SM Clock Samples
        Duration                          : N/A
        Number of Samples                 : N/A
        Max                               : N/A
        Min                               : N/A
        Avg                               : N/A
    Memory Clock Samples
        Duration                          : N/A
        Number of Samples                 : N/A
        Max                               : N/A
        Min                               : N/A
        Avg                               : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
$ sudo dmidecode -t memory
# dmidecode 3.5
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.

Handle 0x0010, DMI type 16, 23 bytes
Physical Memory Array
	Location: System Board Or Motherboard
	Use: System Memory
	Error Correction Type: None
	Maximum Capacity: 128 GB
	Error Information Handle: No Error
	Number Of Devices: 1

Handle 0x0011, DMI type 17, 92 bytes
Memory Device
	Array Handle: 0x0010
	Error Information Handle: Not Provided
	Total Width: 32 bits
	Data Width: 32 bits
	Size: 128 GB
	Form Factor: Chip
	Set: None
	Locator: DIMM0
	Bank Locator: BANK 0
	Type: LPDDR5
	Type Detail: Unknown
	Speed: 8533 MT/s
	Manufacturer: Micron
	Serial Number: None
	Asset Tag: None
	Part Number: None
	Rank: Unknown
	Configured Memory Speed: 8533 MT/s
	Minimum Voltage: Unknown
	Maximum Voltage: Unknown
	Configured Voltage: Unknown
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: Not Specified
	Module Manufacturer ID: Bank 128, Hex 0x00
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 128 GB
	Cache Size: None
	Logical Size: None
$ sudo lshw -short -C memory
H/W path      Device          Class          Description
========================================================
/0/1                          memory         64KiB BIOS
/0/a                          memory         64KiB L1 cache
/0/b                          memory         64KiB L1 cache
/0/c                          memory         512KiB L2 cache
/0/d                          memory         8MiB L3 cache
/0/10                         memory         128GiB System Memory
/0/10/0                       memory         128GiB Chip 8533 MHz (0.1 ns)
<!-- gh-comment-id:3603699668 --> @FlorinAndrei commented on GitHub (Dec 2, 2025): @XuehaiPan thanks a lot for your excellent work! Regarding that one field that shows N/A, it seems like it's not memory bandwidth, but rather the memory clock. To get the actual bandwidth from the clock is a bit tricky and seems to depend on the GPU (bus width, various multipliers, etc). I think that field should be renamed in the app, to show that it's actually a clock frequency: the frequency of the memory clock. It's measured in MHz, which is a unit for frequency. Call it MCK or something. Regardless, on the Spark the unified memory always operates at a fixed clock, so `nvidia-smi` does not report it. Here are some outputs from various tools on the Spark: ``` $ nvidia-smi -q -d CLOCK ==============NVSMI LOG============== Timestamp : Tue Dec 2 11:40:40 2025 Driver Version : 580.95.05 CUDA Version : 13.0 Attached GPUs : 1 GPU 0000000F:01:00.0 Clocks Graphics : 2502 MHz SM : 2502 MHz Memory : N/A Video : 2158 MHz Applications Clocks Graphics : 2418 MHz Memory : N/A Default Applications Clocks Graphics : 2418 MHz Memory : N/A Deferred Clocks Memory : N/A Max Clocks Graphics : 3003 MHz SM : 3003 MHz Memory : N/A Video : 3003 MHz Max Customer Boost Clocks Graphics : N/A SM Clock Samples Duration : N/A Number of Samples : N/A Max : N/A Min : N/A Avg : N/A Memory Clock Samples Duration : N/A Number of Samples : N/A Max : N/A Min : N/A Avg : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A ``` ``` $ sudo dmidecode -t memory # dmidecode 3.5 Getting SMBIOS data from sysfs. SMBIOS 3.3.0 present. Handle 0x0010, DMI type 16, 23 bytes Physical Memory Array Location: System Board Or Motherboard Use: System Memory Error Correction Type: None Maximum Capacity: 128 GB Error Information Handle: No Error Number Of Devices: 1 Handle 0x0011, DMI type 17, 92 bytes Memory Device Array Handle: 0x0010 Error Information Handle: Not Provided Total Width: 32 bits Data Width: 32 bits Size: 128 GB Form Factor: Chip Set: None Locator: DIMM0 Bank Locator: BANK 0 Type: LPDDR5 Type Detail: Unknown Speed: 8533 MT/s Manufacturer: Micron Serial Number: None Asset Tag: None Part Number: None Rank: Unknown Configured Memory Speed: 8533 MT/s Minimum Voltage: Unknown Maximum Voltage: Unknown Configured Voltage: Unknown Memory Technology: DRAM Memory Operating Mode Capability: Volatile memory Firmware Version: Not Specified Module Manufacturer ID: Bank 128, Hex 0x00 Module Product ID: Unknown Memory Subsystem Controller Manufacturer ID: Unknown Memory Subsystem Controller Product ID: Unknown Non-Volatile Size: None Volatile Size: 128 GB Cache Size: None Logical Size: None ``` ``` $ sudo lshw -short -C memory H/W path Device Class Description ======================================================== /0/1 memory 64KiB BIOS /0/a memory 64KiB L1 cache /0/b memory 64KiB L1 cache /0/c memory 512KiB L2 cache /0/d memory 8MiB L3 cache /0/10 memory 128GiB System Memory /0/10/0 memory 128GiB Chip 8533 MHz (0.1 ns) ```
Author
Owner

@FlorinAndrei commented on GitHub (Dec 3, 2025):

Let me re-iterate: the branch via uvx is super-useful. I just started another round of evals, and it's very nice to have all the metrics on a single screen there.

I've added nvitop to my toolbox. It's great!

<!-- gh-comment-id:3605353684 --> @FlorinAndrei commented on GitHub (Dec 3, 2025): Let me re-iterate: the branch via uvx is super-useful. I just started another round of evals, and it's very nice to have all the metrics on a single screen there. I've added nvitop to my toolbox. It's great!
Author
Owner

@XuehaiPan commented on GitHub (Dec 8, 2025):

The patch is released in the latest version. You can try it with:

uvx nvitop
<!-- gh-comment-id:3624535069 --> @XuehaiPan commented on GitHub (Dec 8, 2025): The patch is released in the latest version. You can try it with: ```bash uvx nvitop ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#117
No description provided.