[PR #26] [MERGED] feat(core/libcuda): add Python bindings for CUDA driver APIs #127

Closed
opened 2026-05-05 03:26:22 -06:00 by gitea-mirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/XuehaiPan/nvitop/pull/26
Author: @XuehaiPan
Created: 7/17/2022
Status: Merged
Merged: 7/21/2022
Merged by: @XuehaiPan

Base: mainHead: libcuda


📝 Commits (9)

  • 4df53bf feat(core/libcuda): add Python bindings for CUDA driver APIs
  • f8f60d0 feat(core/device): add method CudaDevice.is_available()
  • cf12842 feat(core/device): parse CUDA_VISIBLE_DEVICES in a subprocess
  • 1a84b98 fix(core/libcuda): fix string truncation by char '\00'
  • 6d2d654 fix(core/libcuda): fix pointer type for string buffers
  • 89770e1 docs(core/device): update docstrings
  • 1435f76 fix(gui/process): fix messages for no processes found in one time print
  • 59c61a7 feat(core/device): refactor CUDA_VISIBLE_DEVICES parsing
  • 5eadd7e fix(core/device): fix missing attributes for Python < 3.7

📊 Changes

6 files changed (+893 additions, -77 deletions)

View changed files

docs/source/apis/core/libcuda.rst (+8 -0)
📝 docs/source/apis/index.rst (+1 -0)
📝 nvitop/core/__init__.py (+2 -1)
📝 nvitop/core/device.py (+229 -75)
nvitop/core/libcuda.py (+651 -0)
📝 nvitop/gui/screens/main/process.py (+2 -1)

📄 Description

Issue Type

  • Improvement/feature implementation

Runtime Environment

  • Operating system and version: Ubuntu 20.04 LTS
  • Terminal emulator and version: GNOME Terminal 3.36.2
  • Python version: 3.9.13
  • NVML version (driver version): 470.129.06
  • nvitop version or commit: main@0a9048
  • python-ml-py version: 11.450.51
  • Locale: en_US.UTF-8

Description

Add Python bindings for CUDA driver APIs.

Motivation and Context

This would simplify the CUDA_VISIBLE_DEVICES environment variable parsing. Handles non-standard CUDA_VISIBLE_DEVICES format. For example:

$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3070 (UUID: GPU-4ba9dbe6-7cbf-4621-a31b-aa2ca6247e31)

$ CUDA_VISIBLE_DEVICES='GPU-4ba9dbe6' ipython
Python 3.9.13 (main, May 17 2022, 14:19:07) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

In [3]: from nvitop import Device, libcuda

In [4]: Device.cuda.all()  # the previous approach only accepts ints and full UUID strings
Out[4]: []                 # get the wrong result that "no CUDA devices found"

In [5]: physical = Device(0)

In [6]: physical.name()
Out[6]: 'NVIDIA GeForce RTX 3070'

In [7]: physical.uuid()
Out[7]: 'GPU-4ba9dbe6-7cbf-4621-a31b-aa2ca6247e31'
In [8]: libcuda.cuInit()

In [9]: libcuda.cuDeviceGetCount()
Out[9]: 1

In [10]: cuda = libcuda.cuDeviceGet(0)

In [11]: libcuda.cuDeviceGetName(cuda)
Out[11]: 'NVIDIA GeForce RTX 3070'

In [12]: libcuda.cuDeviceGetUuid(cuda)
Out[12]: '4ba9dbe6-7cbf-4621-a31b-aa2ca6247e31'

The new CUDA_VISIBLE_DEVICES parser supports abbreviated UUIDs and MIG devices:

$ nvidia-smi
Thu Jul 21 17:13:09 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:25:00.0 Off |                   On |
| N/A   27C    P0    36W / 250W |     45MiB / 40960MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  0    1   0   0  |     25MiB / 19968MiB | 56      0 |  4   0    2    0    0 |
|                  |      0MiB / 32767MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0    5   0   1  |     13MiB /  9856MiB | 28      0 |  2   0    1    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0   13   0   2  |      6MiB /  4864MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB /  8191MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
$ nvidia-smi -L
GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-3eb79704-1571-707c-aee8-f43ce747313d)
  MIG 4g.20gb     Device  0: (UUID: MIG-bcfbdb60-84ab-5b00-9d2b-d440116dade9)
  MIG 2g.10gb     Device  1: (UUID: MIG-d184f67c-c95f-5ef2-a935-195bd0094fbd)
  MIG 1g.5gb      Device  2: (UUID: MIG-37b51284-1df4-5451-979d-3231ccb0822e)
In [1]: from nvitop.core import device

In [2]: device.parse_cuda_visible_devices_to_uuids(None)
Out[2]: ['bcfbdb60-84ab-5b00-9d2b-d440116dade9']

In [3]: device.parse_cuda_visible_devices_to_uuids('')
Process `CUDA_VISIBLE_DEVICES` parser:
Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/opt/python@3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/linuxbrew/.linuxbrew/opt/python@3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/users/panxuehai/nvitop/nvitop/core/device.py", line 2282, in _cuda_visible_devices_parser
    raise ex
  File "/home/users/panxuehai/nvitop/nvitop/core/device.py", line 2265, in _cuda_visible_devices_parser
    libcuda.cuInit()
  File "/home/users/panxuehai/nvitop/nvitop/core/libcuda.py", line 420, in cuInit
    _cudaCheckReturn(ret)
  File "/home/users/panxuehai/nvitop/nvitop/core/libcuda.py", line 306, in _cudaCheckReturn
    raise CUDAError(ret)
nvitop.core.libcuda.CUDAError_NoDevice: No CUDA-capable device is detected. Code: CUDA_ERROR_NO_DEVICE (100).
Out[3]: []
In [4]: device.parse_cuda_visible_devices_to_uuids('0')
Out[4]: ['bcfbdb60-84ab-5b00-9d2b-d440116dade9']

In [5]: device.parse_cuda_visible_devices_to_uuids('1')
Process `CUDA_VISIBLE_DEVICES` parser:
Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/opt/python@3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/linuxbrew/.linuxbrew/opt/python@3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/users/panxuehai/nvitop/nvitop/core/device.py", line 2282, in _cuda_visible_devices_parser
    raise ex
  File "/home/users/panxuehai/nvitop/nvitop/core/device.py", line 2265, in _cuda_visible_devices_parser
    libcuda.cuInit()
  File "/home/users/panxuehai/nvitop/nvitop/core/libcuda.py", line 420, in cuInit
    _cudaCheckReturn(ret)
  File "/home/users/panxuehai/nvitop/nvitop/core/libcuda.py", line 306, in _cudaCheckReturn
    raise CUDAError(ret)
nvitop.core.libcuda.CUDAError_NoDevice: No CUDA-capable device is detected. Code: CUDA_ERROR_NO_DEVICE (100).
Out[5]: []
In [6]: device.parse_cuda_visible_devices_to_uuids('GPU-3eb79704-1571-707c-aee8-f43ce747313d', verbose=False)
Out[6]: ['bcfbdb60-84ab-5b00-9d2b-d440116dade9']

In [7]: device.parse_cuda_visible_devices_to_uuids('GPU-3eb79704', verbose=False)
Out[7]: ['bcfbdb60-84ab-5b00-9d2b-d440116dade9']

In [8]: device.parse_cuda_visible_devices_to_uuids('GPU-3eb79704-1571-707c-aee8-000000000000', verbose=False)
Out[8]: []

In [9]: device.parse_cuda_visible_devices_to_uuids('MIG-d184f67c-c95f-5ef2-a935-195bd0094fbd', verbose=False)
Out[9]: ['d184f67c-c95f-5ef2-a935-195bd0094fbd']

In [10]: device.parse_cuda_visible_devices_to_uuids('MIG-GPU-3eb79704-1571-707c-aee8-f43ce747313d/13/0', verbose=False)
Out[10]: ['37b51284-1df4-5451-979d-3231ccb0822e']

In [11]: device.parse_cuda_visible_devices_to_uuids('MIG-GPU-3eb79704/13/0', verbose=False)
Out[11]: ['37b51284-1df4-5451-979d-3231ccb0822e']

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/XuehaiPan/nvitop/pull/26 **Author:** [@XuehaiPan](https://github.com/XuehaiPan) **Created:** 7/17/2022 **Status:** ✅ Merged **Merged:** 7/21/2022 **Merged by:** [@XuehaiPan](https://github.com/XuehaiPan) **Base:** `main` ← **Head:** `libcuda` --- ### 📝 Commits (9) - [`4df53bf`](https://github.com/XuehaiPan/nvitop/commit/4df53bf102998f1f65967599932ca8e4b16e488d) feat(core/libcuda): add Python bindings for CUDA driver APIs - [`f8f60d0`](https://github.com/XuehaiPan/nvitop/commit/f8f60d061e48c527742bb613a875869492234085) feat(core/device): add method `CudaDevice.is_available()` - [`cf12842`](https://github.com/XuehaiPan/nvitop/commit/cf1284256feafe6bef270938300a6d6fa55f08c7) feat(core/device): parse `CUDA_VISIBLE_DEVICES` in a subprocess - [`1a84b98`](https://github.com/XuehaiPan/nvitop/commit/1a84b9820eb29e7e163255178ed235425988b0a9) fix(core/libcuda): fix string truncation by char '\00' - [`6d2d654`](https://github.com/XuehaiPan/nvitop/commit/6d2d654b007f96e9357bb03d16e7bd1fa3470215) fix(core/libcuda): fix pointer type for string buffers - [`89770e1`](https://github.com/XuehaiPan/nvitop/commit/89770e186bf9ea0fe7a32c6a3d5d9e6d72a3dfa6) docs(core/device): update docstrings - [`1435f76`](https://github.com/XuehaiPan/nvitop/commit/1435f76a87ba9883b09c21c5594bf39f3c47555b) fix(gui/process): fix messages for no processes found in one time print - [`59c61a7`](https://github.com/XuehaiPan/nvitop/commit/59c61a7159358794c7d5428e9f92c2f1a7ba3668) feat(core/device): refactor `CUDA_VISIBLE_DEVICES` parsing - [`5eadd7e`](https://github.com/XuehaiPan/nvitop/commit/5eadd7e658dfb6081b152e03a0f32da64ba0c96b) fix(core/device): fix missing attributes for Python < 3.7 ### 📊 Changes **6 files changed** (+893 additions, -77 deletions) <details> <summary>View changed files</summary> ➕ `docs/source/apis/core/libcuda.rst` (+8 -0) 📝 `docs/source/apis/index.rst` (+1 -0) 📝 `nvitop/core/__init__.py` (+2 -1) 📝 `nvitop/core/device.py` (+229 -75) ➕ `nvitop/core/libcuda.py` (+651 -0) 📝 `nvitop/gui/screens/main/process.py` (+2 -1) </details> ### 📄 Description #### Issue Type <!-- Pick relevant types and delete the rest --> - Improvement/feature implementation #### Runtime Environment <!-- Details of your runtime environment --> - Operating system and version: Ubuntu 20.04 LTS - Terminal emulator and version: GNOME Terminal 3.36.2 - Python version: `3.9.13` - NVML version (driver version): `470.129.06` - `nvitop` version or commit: `main@0a9048` - `python-ml-py` version: `11.450.51` - Locale: `en_US.UTF-8` #### Description <!-- Describe the changes in detail --> Add Python bindings for CUDA driver APIs. #### Motivation and Context <!-- Why are these changes required? --> <!-- What problems do these changes solve? --> <!-- Link to relevant issues --> This would simplify the `CUDA_VISIBLE_DEVICES` environment variable parsing. Handles non-standard `CUDA_VISIBLE_DEVICES` format. For example: ```console $ nvidia-smi -L GPU 0: NVIDIA GeForce RTX 3070 (UUID: GPU-4ba9dbe6-7cbf-4621-a31b-aa2ca6247e31) $ CUDA_VISIBLE_DEVICES='GPU-4ba9dbe6' ipython Python 3.9.13 (main, May 17 2022, 14:19:07) Type 'copyright', 'credits' or 'license' for more information IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help. ``` ```python In [1]: import torch In [2]: torch.cuda.is_available() Out[2]: True In [3]: from nvitop import Device, libcuda In [4]: Device.cuda.all() # the previous approach only accepts ints and full UUID strings Out[4]: [] # get the wrong result that "no CUDA devices found" In [5]: physical = Device(0) In [6]: physical.name() Out[6]: 'NVIDIA GeForce RTX 3070' In [7]: physical.uuid() Out[7]: 'GPU-4ba9dbe6-7cbf-4621-a31b-aa2ca6247e31' ``` ```python In [8]: libcuda.cuInit() In [9]: libcuda.cuDeviceGetCount() Out[9]: 1 In [10]: cuda = libcuda.cuDeviceGet(0) In [11]: libcuda.cuDeviceGetName(cuda) Out[11]: 'NVIDIA GeForce RTX 3070' In [12]: libcuda.cuDeviceGetUuid(cuda) Out[12]: '4ba9dbe6-7cbf-4621-a31b-aa2ca6247e31' ``` The new `CUDA_VISIBLE_DEVICES` parser supports abbreviated UUIDs and MIG devices: ```console $ nvidia-smi Thu Jul 21 17:13:09 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-PCI... Off | 00000000:25:00.0 Off | On | | N/A 27C P0 36W / 250W | 45MiB / 40960MiB | N/A Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |==================+======================+===========+=======================| | 0 1 0 0 | 25MiB / 19968MiB | 56 0 | 4 0 2 0 0 | | | 0MiB / 32767MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 5 0 1 | 13MiB / 9856MiB | 28 0 | 2 0 1 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 13 0 2 | 6MiB / 4864MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 8191MiB | | | +------------------+----------------------+-----------+-----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` ```console $ nvidia-smi -L GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-3eb79704-1571-707c-aee8-f43ce747313d) MIG 4g.20gb Device 0: (UUID: MIG-bcfbdb60-84ab-5b00-9d2b-d440116dade9) MIG 2g.10gb Device 1: (UUID: MIG-d184f67c-c95f-5ef2-a935-195bd0094fbd) MIG 1g.5gb Device 2: (UUID: MIG-37b51284-1df4-5451-979d-3231ccb0822e) ``` ```python In [1]: from nvitop.core import device In [2]: device.parse_cuda_visible_devices_to_uuids(None) Out[2]: ['bcfbdb60-84ab-5b00-9d2b-d440116dade9'] In [3]: device.parse_cuda_visible_devices_to_uuids('') Process `CUDA_VISIBLE_DEVICES` parser: Traceback (most recent call last): File "/home/linuxbrew/.linuxbrew/opt/python@3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/linuxbrew/.linuxbrew/opt/python@3.9/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/users/panxuehai/nvitop/nvitop/core/device.py", line 2282, in _cuda_visible_devices_parser raise ex File "/home/users/panxuehai/nvitop/nvitop/core/device.py", line 2265, in _cuda_visible_devices_parser libcuda.cuInit() File "/home/users/panxuehai/nvitop/nvitop/core/libcuda.py", line 420, in cuInit _cudaCheckReturn(ret) File "/home/users/panxuehai/nvitop/nvitop/core/libcuda.py", line 306, in _cudaCheckReturn raise CUDAError(ret) nvitop.core.libcuda.CUDAError_NoDevice: No CUDA-capable device is detected. Code: CUDA_ERROR_NO_DEVICE (100). Out[3]: [] ``` ```python In [4]: device.parse_cuda_visible_devices_to_uuids('0') Out[4]: ['bcfbdb60-84ab-5b00-9d2b-d440116dade9'] In [5]: device.parse_cuda_visible_devices_to_uuids('1') Process `CUDA_VISIBLE_DEVICES` parser: Traceback (most recent call last): File "/home/linuxbrew/.linuxbrew/opt/python@3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/linuxbrew/.linuxbrew/opt/python@3.9/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/users/panxuehai/nvitop/nvitop/core/device.py", line 2282, in _cuda_visible_devices_parser raise ex File "/home/users/panxuehai/nvitop/nvitop/core/device.py", line 2265, in _cuda_visible_devices_parser libcuda.cuInit() File "/home/users/panxuehai/nvitop/nvitop/core/libcuda.py", line 420, in cuInit _cudaCheckReturn(ret) File "/home/users/panxuehai/nvitop/nvitop/core/libcuda.py", line 306, in _cudaCheckReturn raise CUDAError(ret) nvitop.core.libcuda.CUDAError_NoDevice: No CUDA-capable device is detected. Code: CUDA_ERROR_NO_DEVICE (100). Out[5]: [] ``` ```python In [6]: device.parse_cuda_visible_devices_to_uuids('GPU-3eb79704-1571-707c-aee8-f43ce747313d', verbose=False) Out[6]: ['bcfbdb60-84ab-5b00-9d2b-d440116dade9'] In [7]: device.parse_cuda_visible_devices_to_uuids('GPU-3eb79704', verbose=False) Out[7]: ['bcfbdb60-84ab-5b00-9d2b-d440116dade9'] In [8]: device.parse_cuda_visible_devices_to_uuids('GPU-3eb79704-1571-707c-aee8-000000000000', verbose=False) Out[8]: [] In [9]: device.parse_cuda_visible_devices_to_uuids('MIG-d184f67c-c95f-5ef2-a935-195bd0094fbd', verbose=False) Out[9]: ['d184f67c-c95f-5ef2-a935-195bd0094fbd'] In [10]: device.parse_cuda_visible_devices_to_uuids('MIG-GPU-3eb79704-1571-707c-aee8-f43ce747313d/13/0', verbose=False) Out[10]: ['37b51284-1df4-5451-979d-3231ccb0822e'] In [11]: device.parse_cuda_visible_devices_to_uuids('MIG-GPU-3eb79704/13/0', verbose=False) Out[11]: ['37b51284-1df4-5451-979d-3231ccb0822e'] ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
gitea-mirror 2026-05-05 03:26:22 -06:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#127
No description provided.