[PR #208] Fix incorrect memory reporting on coherent UMA platforms (GB10 / DGX … #212

Open
opened 2026-05-05 03:27:54 -06:00 by gitea-mirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/XuehaiPan/nvitop/pull/208
Author: @parallelArchitect
Created: 4/16/2026
Status: 🔄 Open

Base: mainHead: fix/gb10-coherent-uma-memory-reporting


📝 Commits (2)

  • de09aeb Fix incorrect memory reporting on coherent UMA platforms (GB10 / DGX Spark)
  • 2ca5797 fix: replace UMA acronym in comment to pass spell check

📊 Changes

1 file changed (+15 additions, -7 deletions)

View changed files

📝 nvitop/api/device.py (+15 -7)

📄 Description

Fix incorrect memory reporting on coherent UMA platforms (GB10 / DGX Spark)

On GB10 / DGX Spark, nvmlDeviceGetMemoryInfo returns NVML_SUCCESS with total equal to system MemTotal (~121GB). This causes nvitop to display full system RAM as GPU memory instead of actually allocatable memory.

The existing NVMLError_NotSupported path correctly handles some UMA platforms, but GB10 returns NVML_SUCCESS — not NOT_SUPPORTED — so it falls through to the discrete GPU path and displays wrong values.

Issue Type

  • Bug fix

Description

Detect coherent UMA by comparing NVML-reported total against system virtual memory total. If total >= 90% of system RAM, classify as unified memory and use system virtual memory (MemAvailable) for display instead.

Preserves existing behavior for discrete GPUs.

Motivation and Context

Same root cause documented and fixed in:

Note

Requires validation on GB10 / DGX Spark hardware. The fix has not been independently validated on a coherent UMA system.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/XuehaiPan/nvitop/pull/208 **Author:** [@parallelArchitect](https://github.com/parallelArchitect) **Created:** 4/16/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix/gb10-coherent-uma-memory-reporting` --- ### 📝 Commits (2) - [`de09aeb`](https://github.com/XuehaiPan/nvitop/commit/de09aeb9f018aefa850a7fb7377374c4ca6368d2) Fix incorrect memory reporting on coherent UMA platforms (GB10 / DGX Spark) - [`2ca5797`](https://github.com/XuehaiPan/nvitop/commit/2ca5797f3ad07ddc33901808500b0c0f5b803606) fix: replace UMA acronym in comment to pass spell check ### 📊 Changes **1 file changed** (+15 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `nvitop/api/device.py` (+15 -7) </details> ### 📄 Description Fix incorrect memory reporting on coherent UMA platforms (GB10 / DGX Spark) On GB10 / DGX Spark, `nvmlDeviceGetMemoryInfo` returns `NVML_SUCCESS` with `total` equal to system `MemTotal` (~121GB). This causes nvitop to display full system RAM as GPU memory instead of actually allocatable memory. The existing `NVMLError_NotSupported` path correctly handles some UMA platforms, but GB10 returns `NVML_SUCCESS` — not `NOT_SUPPORTED` — so it falls through to the discrete GPU path and displays wrong values. #### Issue Type - Bug fix #### Description Detect coherent UMA by comparing NVML-reported `total` against system virtual memory total. If total >= 90% of system RAM, classify as unified memory and use system virtual memory (`MemAvailable`) for display instead. Preserves existing behavior for discrete GPUs. #### Motivation and Context Same root cause documented and fixed in: - nvtop PR: https://github.com/Syllo/nvtop/pull/463 - btop PR: https://github.com/aristocratos/btop/pull/1611 - NVML shim workaround: https://github.com/parallelArchitect/nvml-unified-shim #### Note Requires validation on GB10 / DGX Spark hardware. The fix has not been independently validated on a coherent UMA system. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
gitea-mirror added the
pull-request
label 2026-05-05 03:27:54 -06:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#212
No description provided.