[PR #210] Add mx-smi backend support for MetaX GPUs #211

Open
opened 2026-05-05 03:27:54 -06:00 by gitea-mirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/XuehaiPan/nvitop/pull/210
Author: @mhson-kyle
Created: 4/29/2026
Status: 🔄 Open

Base: mainHead: main


📝 Commits (4)

  • a306d69 Add mx-smi MetaX GPU backend
  • dd9aeb7 libmxsmi: cache mx-smi -L inventory separately with 60s TTL
  • 7336642 device: replace is_available() in _nvml_probe() with shutil.which check
  • ee8b997 Merge pull request #1 from mhson-kyle/metax-mx-smi-support

📊 Changes

5 files changed (+741 additions, -23 deletions)

View changed files

📝 nvitop/__init__.py (+2 -0)
📝 nvitop/api/__init__.py (+2 -0)
📝 nvitop/api/device.py (+228 -21)
nvitop/api/libmxsmi.py (+500 -0)
📝 nvitop/tui/screens/main/panels/device.py (+9 -2)

📄 Description

Issue Type
Improvement/feature implementation
Runtime Environment
Operating system and version: AlmaLinux 9.7
Terminal emulator and version: screen / remote shell
Python version: 3.9.25
NVML version (driver version): N/A for MetaX; mx-smi KMD driver 2.16.0, MACA runtime 3.0.0.8
nvitop version or commit: 1.6.3.dev11+ga306d69 / a306d69
python-ml-py version: nvidia-ml-py 13.595.45
Locale: en_US.UTF-8
Description
This adds support for MetaX GPUs through mx-smi, allowing nvitop to run on systems where NVIDIA NVML is unavailable but MetaX devices are present.

The change introduces an mx-smi backend that parses MetaX GPU inventory, utilization, memory, temperature, power, driver/runtime versions, and process information. The existing Device API now falls back to mx-smi when NVML is
unavailable, and the backend can also be forced with:

NVITOP_GPU_BACKEND=mx-smi

The TUI header was also updated to show MetaX-specific version labels, using KMD and MACA versions instead of NVIDIA driver/CUDA labels when the active backend is mx-smi.

Motivation and Context

nvitop currently assumes NVIDIA/NVML availability. On MetaX GPU servers, nvidia-smi/NVML is not available, while GPU information is exposed through mx-smi.

This allows users on MetaX systems to use the same nvitop interface for monitoring GPU status and GPU processes.

Testing

Tested on a MetaX C500 server with 8 GPUs and /usr/bin/mx-smi available.

Checks run:

/usr/bin/python3.9 -m py_compile nvitop/api/libmxsmi.py nvitop/api/device.py nvitop/api/init.py nvitop/init.py nvitop/tui/screens/main/panels/device.py

API smoke test verified:

  • backend detection returns mx-smi
  • device count returns 8
  • driver version returns 2.16.0
  • MACA runtime version returns 3.0.0.8
  • device memory snapshot works

Also tested:

CUDA_VISIBLE_DEVICES=1,0

to verify MetaX device filtering/order handling.

TUI smoke test:

nvitop --once

confirmed all 8 MetaX C500 devices render correctly.

Exporter smoke test also passed after installing nvitop-exporter.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/XuehaiPan/nvitop/pull/210 **Author:** [@mhson-kyle](https://github.com/mhson-kyle) **Created:** 4/29/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (4) - [`a306d69`](https://github.com/XuehaiPan/nvitop/commit/a306d69a36cec6d0a1554042d85024708dadb400) Add mx-smi MetaX GPU backend - [`dd9aeb7`](https://github.com/XuehaiPan/nvitop/commit/dd9aeb7bcac7f8df3cfd3b7f3682529ab77f4b34) libmxsmi: cache mx-smi -L inventory separately with 60s TTL - [`7336642`](https://github.com/XuehaiPan/nvitop/commit/7336642d29919938e579a377a1cf5d14215cb3e1) device: replace is_available() in _nvml_probe() with shutil.which check - [`ee8b997`](https://github.com/XuehaiPan/nvitop/commit/ee8b997c6ec60364eb630f5114e6ff17fc81987c) Merge pull request #1 from mhson-kyle/metax-mx-smi-support ### 📊 Changes **5 files changed** (+741 additions, -23 deletions) <details> <summary>View changed files</summary> 📝 `nvitop/__init__.py` (+2 -0) 📝 `nvitop/api/__init__.py` (+2 -0) 📝 `nvitop/api/device.py` (+228 -21) ➕ `nvitop/api/libmxsmi.py` (+500 -0) 📝 `nvitop/tui/screens/main/panels/device.py` (+9 -2) </details> ### 📄 Description Issue Type Improvement/feature implementation Runtime Environment Operating system and version: AlmaLinux 9.7 Terminal emulator and version: screen / remote shell Python version: 3.9.25 NVML version (driver version): N/A for MetaX; mx-smi KMD driver 2.16.0, MACA runtime 3.0.0.8 nvitop version or commit: 1.6.3.dev11+ga306d69 / a306d69 python-ml-py version: nvidia-ml-py 13.595.45 Locale: en_US.UTF-8 Description This adds support for MetaX GPUs through mx-smi, allowing nvitop to run on systems where NVIDIA NVML is unavailable but MetaX devices are present. The change introduces an mx-smi backend that parses MetaX GPU inventory, utilization, memory, temperature, power, driver/runtime versions, and process information. The existing Device API now falls back to mx-smi when NVML is unavailable, and the backend can also be forced with: NVITOP_GPU_BACKEND=mx-smi The TUI header was also updated to show MetaX-specific version labels, using KMD and MACA versions instead of NVIDIA driver/CUDA labels when the active backend is mx-smi. #### Motivation and Context nvitop currently assumes NVIDIA/NVML availability. On MetaX GPU servers, nvidia-smi/NVML is not available, while GPU information is exposed through mx-smi. This allows users on MetaX systems to use the same nvitop interface for monitoring GPU status and GPU processes. #### Testing Tested on a MetaX C500 server with 8 GPUs and /usr/bin/mx-smi available. Checks run: /usr/bin/python3.9 -m py_compile nvitop/api/libmxsmi.py nvitop/api/device.py nvitop/api/__init__.py nvitop/__init__.py nvitop/tui/screens/main/panels/device.py API smoke test verified: - backend detection returns mx-smi - device count returns 8 - driver version returns 2.16.0 - MACA runtime version returns 3.0.0.8 - device memory snapshot works Also tested: CUDA_VISIBLE_DEVICES=1,0 to verify MetaX device filtering/order handling. TUI smoke test: nvitop --once confirmed all 8 MetaX C500 devices render correctly. Exporter smoke test also passed after installing nvitop-exporter. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
gitea-mirror added the
pull-request
label 2026-05-05 03:27:54 -06:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#211
No description provided.