[PR #28] [MERGED] feat(select): add CUDA visible devices selection tool #128

Closed
opened 2026-05-05 03:26:23 -06:00 by gitea-mirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/XuehaiPan/nvitop/pull/28
Author: @XuehaiPan
Created: 7/22/2022
Status: Merged
Merged: 7/22/2022
Merged by: @XuehaiPan

Base: mainHead: select


📝 Commits (7)

  • 993d978 feat(core/utils): add function human2bytes
  • c5e56b8 feat(select): add CUDA visible devices selection tool
  • 4834569 docs: add notes for nvisel in README.md
  • 02c73f8 chore(select): sort device index in descending to keep <GPU 0> free
  • ee0faf4 chore(select): add convenient option --count
  • 87ec418 docs: add nvisel sample results
  • fdd4ee5 chore(select): check input value range

📊 Changes

7 files changed (+809 additions, -63 deletions)

View changed files

📝 README.md (+112 -0)
nvisel.py (+37 -0)
📝 nvitop/__init__.py (+2 -1)
📝 nvitop/core/device.py (+0 -5)
📝 nvitop/core/utils.py (+155 -57)
nvitop/select.py (+502 -0)
📝 pyproject.toml (+1 -0)

📄 Description

Issue Type

  • Improvement/feature implementation

Runtime Environment

  • Operating system and version: Ubuntu 20.04 LTS
  • Terminal emulator and version: GNOME Terminal 3.36.2
  • Python version: 3.9.13
  • NVML version (driver version): 470.129.06
  • nvitop version or commit: main@0a9048
  • python-ml-py version: 11.450.51
  • Locale: en_US.UTF-8

Description

Add CUDA visible devices selection tool nvisel. Selected a subset of devices satisfying the specified criteria.

Motivation and Context

This would simplify the CUDA_VISIBLE_DEVICES environment variable selection.

Usage:

usage: nvisel [--help] [--version] [--inherit] [--account-as-free [USERNAME ...]]
              [--min-count N] [--max-count N] [--count N]
              [--min-free-memory SIZE] [--min-total-memory SIZE]
              [--max-gpu-utilization RATE] [--max-memory-utilization RATE]
              [--tolerance TOL] [--format FORMAT] [--sep SEP | --newline | --null]

CUDA visible devices selection tool.

optional arguments:
  --help, -h            Show this help message and exit.
  --version, -V         Show nvisel's version number and exit.

constraints:
  --inherit             Inherit the current `CUDA_VISIBLE_DEVICES` environment variable.
                        This means selecting a subset of the currently CUDA-visible devices.
  --account-as-free [USERNAME ...]
                        Account the used GPU memory of the given users as free memory.
                        If this option is specified but without argument, `$USER` will be used.
  --min-count N, -c N   Minimum number of devices to select. (default: 0)
                        The tool will fail (exit non-zero) if the requested resource is not available.
  --max-count N, -C N   Maximum number of devices to select. (default: all devices)
  --count N, -n N       Overriding both `--min-count N` and `--max-count N`.
  --min-free-memory SIZE, -f SIZE
                        Minimum free memory of devices to select. (example value: 4GiB)
                        If this constraint is given, check against all devices.
  --min-total-memory SIZE, -t SIZE
                        Minimum total memory of devices to select. (example value: 10GiB)
                        If this constraint is given, check against all devices.
  --max-gpu-utilization RATE, -G RATE
                        Maximum GPU utilization rate of devices to select. (example value: 30)
                        If this constraint is given, check against all devices.
  --max-memory-utilization RATE, -M RATE
                        Maximum memory bandwidth utilization rate of devices to select. (example value: 50)
                        If this constraint is given, check against all devices.
  --tolerance TOL, --tol TOL
                        The constraints tolerance (in percentage). (default: 0, i.e., strict)
                        This option can loose the constraints if the requested resource is not available.
                        For example, set `--tolerance=20` will accept a device with only 4GiB of free
                        memory when set `--min-free-memory=5GiB`.

formatting:
  --format FORMAT, -O FORMAT
                        The output format of the selected device identifiers. (default: index)
                        If any MIG device found, the output format will be fallback to `uuid`.
  --sep SEP, --separator SEP, -s SEP
                        Separator for the output. (default: ',')
  --newline             Use newline character as separator for the output, equivalent to `--sep=$'\n'`.
  --null, -0            Use null character ('\x00') as separator for the output, equivalent to `--sep=$'\0'`.

Examples:

# All devices but sorted
$ nvisel       # or use `python3 -m nvitop.select`
6,5,4,3,2,1,0,7,8

# A simple example to select 4 devices
$ nvisel -n 4  # or use `python3 -m nvitop.select -n 4`
6,5,4,3

# Select available devices satisfy the given constraints
$ nvisel --min-count 2 --max-count 3 --min-free-memory 5GiB --max-gpu-utilization 60
6,5,4

# Set `CUDA_VISIBLE_DEVICES` environment variable using `nvisel`
$ export CUDA_DEVICE_ORDER="PCI_BUS_ID" CUDA_VISIBLE_DEVICES="$(nvisel -c 1 -f 10GiB)"
CUDA_VISIBLE_DEVICES="6,5,4,3,2,1,0"

# Use UUID strings in `CUDA_VISIBLE_DEVICES` environment variable
$ export CUDA_VISIBLE_DEVICES="$(nvisel -O uuid -c 2 -f 5000M)"
CUDA_VISIBLE_DEVICES="GPU-849d5a8d-610e-eeea-1fd4-81ff44a23794,GPU-18ef14e9-dec6-1d7e-1284-3010c6ce98b1,GPU-96de99c9-d68f-84c8-424c-7c75e59cc0a0,GPU-2428d171-8684-5b64-830c-435cd972ec4a,GPU-6d2a57c9-7783-44bb-9f53-13f36282830a,GPU-f8e5a624-2c7e-417c-e647-b764d26d4733,GPU-f9ca790e-683e-3d56-00ba-8f654e977e02"

# Pipe output to other shell utilities
$ nvisel -0 -O uuid -c 2 -f 4GiB | xargs -0 -I {} nvidia-smi --id={} --query-gpu=index,memory.free --format=csv
CUDA_VISIBLE_DEVICES="GPU-849d5a8d-610e-eeea-1fd4-81ff44a23794,GPU-18ef14e9-dec6-1d7e-1284-3010c6ce98b1,GPU-96de99c9-d68f-84c8-424c-7c75e59cc0a0,GPU-2428d171-8684-5b64-830c-435cd972ec4a,GPU-6d2a57c9-7783-44bb-9f53-13f36282830a,GPU-f8e5a624-2c7e-417c-e647-b764d26d4733,GPU-f9ca790e-683e-3d56-00ba-8f654e977e02"
index, memory.free [MiB]
6, 11018 MiB
index, memory.free [MiB]
5, 11018 MiB
index, memory.free [MiB]
4, 11018 MiB
index, memory.free [MiB]
3, 11018 MiB
index, memory.free [MiB]
2, 11018 MiB
index, memory.free [MiB]
1, 11018 MiB
index, memory.free [MiB]
0, 11018 MiB

Python integration:

# Put this at the top of the Python script
import os
from nvitop import select_devices

os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(
    select_devices(format='uuid', min_count=4, min_free_memory='8GiB')
)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/XuehaiPan/nvitop/pull/28 **Author:** [@XuehaiPan](https://github.com/XuehaiPan) **Created:** 7/22/2022 **Status:** ✅ Merged **Merged:** 7/22/2022 **Merged by:** [@XuehaiPan](https://github.com/XuehaiPan) **Base:** `main` ← **Head:** `select` --- ### 📝 Commits (7) - [`993d978`](https://github.com/XuehaiPan/nvitop/commit/993d97883b3ab6e14bc841a91b340775386940cf) feat(core/utils): add function `human2bytes` - [`c5e56b8`](https://github.com/XuehaiPan/nvitop/commit/c5e56b8f23c6e9641a3476f6e1d12f7bc68a563b) feat(select): add CUDA visible devices selection tool - [`4834569`](https://github.com/XuehaiPan/nvitop/commit/4834569d9b8eabac5cf3888158bd9bf5a4b9807a) docs: add notes for `nvisel` in README.md - [`02c73f8`](https://github.com/XuehaiPan/nvitop/commit/02c73f8faf80f1d6923cbf98422b2ce582cbbbee) chore(select): sort device index in descending to keep <GPU 0> free - [`ee0faf4`](https://github.com/XuehaiPan/nvitop/commit/ee0faf469aa070ac083142e2e72ca7a6e49fe8d2) chore(select): add convenient option `--count` - [`87ec418`](https://github.com/XuehaiPan/nvitop/commit/87ec4180f705ccd8c05393468a2eab53fcacd26f) docs: add `nvisel` sample results - [`fdd4ee5`](https://github.com/XuehaiPan/nvitop/commit/fdd4ee5c2b0d4a95867cc71d76241906c3f591f6) chore(select): check input value range ### 📊 Changes **7 files changed** (+809 additions, -63 deletions) <details> <summary>View changed files</summary> 📝 `README.md` (+112 -0) ➕ `nvisel.py` (+37 -0) 📝 `nvitop/__init__.py` (+2 -1) 📝 `nvitop/core/device.py` (+0 -5) 📝 `nvitop/core/utils.py` (+155 -57) ➕ `nvitop/select.py` (+502 -0) 📝 `pyproject.toml` (+1 -0) </details> ### 📄 Description #### Issue Type <!-- Pick relevant types and delete the rest --> - Improvement/feature implementation #### Runtime Environment <!-- Details of your runtime environment --> - Operating system and version: Ubuntu 20.04 LTS - Terminal emulator and version: GNOME Terminal 3.36.2 - Python version: `3.9.13` - NVML version (driver version): `470.129.06` - `nvitop` version or commit: `main@0a9048` - `python-ml-py` version: `11.450.51` - Locale: `en_US.UTF-8` #### Description <!-- Describe the changes in detail --> Add CUDA visible devices selection tool `nvisel`. Selected a subset of devices satisfying the specified criteria. #### Motivation and Context <!-- Why are these changes required? --> <!-- What problems do these changes solve? --> <!-- Link to relevant issues --> This would simplify the `CUDA_VISIBLE_DEVICES` environment variable selection. Usage: ```text usage: nvisel [--help] [--version] [--inherit] [--account-as-free [USERNAME ...]] [--min-count N] [--max-count N] [--count N] [--min-free-memory SIZE] [--min-total-memory SIZE] [--max-gpu-utilization RATE] [--max-memory-utilization RATE] [--tolerance TOL] [--format FORMAT] [--sep SEP | --newline | --null] CUDA visible devices selection tool. optional arguments: --help, -h Show this help message and exit. --version, -V Show nvisel's version number and exit. constraints: --inherit Inherit the current `CUDA_VISIBLE_DEVICES` environment variable. This means selecting a subset of the currently CUDA-visible devices. --account-as-free [USERNAME ...] Account the used GPU memory of the given users as free memory. If this option is specified but without argument, `$USER` will be used. --min-count N, -c N Minimum number of devices to select. (default: 0) The tool will fail (exit non-zero) if the requested resource is not available. --max-count N, -C N Maximum number of devices to select. (default: all devices) --count N, -n N Overriding both `--min-count N` and `--max-count N`. --min-free-memory SIZE, -f SIZE Minimum free memory of devices to select. (example value: 4GiB) If this constraint is given, check against all devices. --min-total-memory SIZE, -t SIZE Minimum total memory of devices to select. (example value: 10GiB) If this constraint is given, check against all devices. --max-gpu-utilization RATE, -G RATE Maximum GPU utilization rate of devices to select. (example value: 30) If this constraint is given, check against all devices. --max-memory-utilization RATE, -M RATE Maximum memory bandwidth utilization rate of devices to select. (example value: 50) If this constraint is given, check against all devices. --tolerance TOL, --tol TOL The constraints tolerance (in percentage). (default: 0, i.e., strict) This option can loose the constraints if the requested resource is not available. For example, set `--tolerance=20` will accept a device with only 4GiB of free memory when set `--min-free-memory=5GiB`. formatting: --format FORMAT, -O FORMAT The output format of the selected device identifiers. (default: index) If any MIG device found, the output format will be fallback to `uuid`. --sep SEP, --separator SEP, -s SEP Separator for the output. (default: ',') --newline Use newline character as separator for the output, equivalent to `--sep=$'\n'`. --null, -0 Use null character ('\x00') as separator for the output, equivalent to `--sep=$'\0'`. ``` Examples: ```console # All devices but sorted $ nvisel # or use `python3 -m nvitop.select` 6,5,4,3,2,1,0,7,8 # A simple example to select 4 devices $ nvisel -n 4 # or use `python3 -m nvitop.select -n 4` 6,5,4,3 # Select available devices satisfy the given constraints $ nvisel --min-count 2 --max-count 3 --min-free-memory 5GiB --max-gpu-utilization 60 6,5,4 # Set `CUDA_VISIBLE_DEVICES` environment variable using `nvisel` $ export CUDA_DEVICE_ORDER="PCI_BUS_ID" CUDA_VISIBLE_DEVICES="$(nvisel -c 1 -f 10GiB)" CUDA_VISIBLE_DEVICES="6,5,4,3,2,1,0" # Use UUID strings in `CUDA_VISIBLE_DEVICES` environment variable $ export CUDA_VISIBLE_DEVICES="$(nvisel -O uuid -c 2 -f 5000M)" CUDA_VISIBLE_DEVICES="GPU-849d5a8d-610e-eeea-1fd4-81ff44a23794,GPU-18ef14e9-dec6-1d7e-1284-3010c6ce98b1,GPU-96de99c9-d68f-84c8-424c-7c75e59cc0a0,GPU-2428d171-8684-5b64-830c-435cd972ec4a,GPU-6d2a57c9-7783-44bb-9f53-13f36282830a,GPU-f8e5a624-2c7e-417c-e647-b764d26d4733,GPU-f9ca790e-683e-3d56-00ba-8f654e977e02" # Pipe output to other shell utilities $ nvisel -0 -O uuid -c 2 -f 4GiB | xargs -0 -I {} nvidia-smi --id={} --query-gpu=index,memory.free --format=csv CUDA_VISIBLE_DEVICES="GPU-849d5a8d-610e-eeea-1fd4-81ff44a23794,GPU-18ef14e9-dec6-1d7e-1284-3010c6ce98b1,GPU-96de99c9-d68f-84c8-424c-7c75e59cc0a0,GPU-2428d171-8684-5b64-830c-435cd972ec4a,GPU-6d2a57c9-7783-44bb-9f53-13f36282830a,GPU-f8e5a624-2c7e-417c-e647-b764d26d4733,GPU-f9ca790e-683e-3d56-00ba-8f654e977e02" index, memory.free [MiB] 6, 11018 MiB index, memory.free [MiB] 5, 11018 MiB index, memory.free [MiB] 4, 11018 MiB index, memory.free [MiB] 3, 11018 MiB index, memory.free [MiB] 2, 11018 MiB index, memory.free [MiB] 1, 11018 MiB index, memory.free [MiB] 0, 11018 MiB ``` Python integration: ```python # Put this at the top of the Python script import os from nvitop import select_devices os.environ['CUDA_VISIBLE_DEVICES'] = ','.join( select_devices(format='uuid', min_count=4, min_free_memory='8GiB') ) ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
gitea-mirror 2026-05-05 03:26:23 -06:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#128
No description provided.