mirror of
https://github.com/XuehaiPan/nvitop.git
synced 2026-05-15 14:15:55 -06:00
[GH-ISSUE #5] [Feature Request] MIG device support (e.g. A100 GPUs) #5
Labels
No labels
api
bug
bug
cli / tui
dependencies
documentation
documentation
documentation
duplicate
enhancement
exporter
invalid
pull-request
pynvml
question
question
upstream
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/nvitop#5
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ki-arie on GitHub (Aug 10, 2021).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/5
Originally assigned to: @XuehaiPan on GitHub.
Hello!
Firstly, thanks for creating and maintaining such an excellent library.
Runtime Environment
450.0nvitopversion or commit:main@b669fa3python-ml-pyversion:11.450.51en_US.UTF-8Current Behavior
When running
nvitopon MiG enabled A100 GPU.nvitopfails to detect the GPU running process and GPU memory consumption. Which can otherwise be viewed by running the command,nvidia-smiExpected Behavior
The A100 MiG GPU should be visible in the GUI.
Context
So far we can only view CPU usage metrics, which are really handy but it would also be nice to have GPU usage as designed.
Possible Solutions
I think that the MiG naming convention is different from regular naming conventions, and looks something like this:
MIG 7g.80gb Device 0:rather than justDevice 0:as is currently set-up in the nvitop repo.Steps to reproduce
watch -n 0.5 nvitop@XuehaiPan commented on GitHub (Aug 10, 2021):
Thanks for the feedback! I'm sorry that
nvitopdoes not support MIG enabled devices yet. But we are working on it. It would be very nice that if you can help us to makenvitopbetter.Ref wookayin/gpustat#102
nvitophas not tested on MIG enabled devices. (I don't have any A100/A30 GPU available though.) Could you please run the following commands on your device, which could be very helpful to identify the error.The content of
test.py:Agreed. I think we should redesign the UI and add a new panel for MIG devices.
You can use the monitor mode of
nvitopby:Type
nvitop --helpfor more command line options.@ki-arie commented on GitHub (Aug 10, 2021):
Hi,
Thanks for the quick response! Here are the outputs of running the above commands:
pip3 install nvidia-ml-py==11.450.51 # the pinned version for nvitop python3 test.py, the console output is:pip3 install nvidia-ml-py==11.450.129, the console output is:AFAIK - this is expected as to use MiG mode you've gotta disable the A100s from using Fabric Manager and NVLink.
nvidia-smiresults in this screen:Let me know how else I can help with this - this library's pretty cool :)
@XuehaiPan commented on GitHub (Aug 11, 2021):
Thanks to @ki-arie !
It seams that we cannot get the GPU level infos about fans peed, memory usage, GPU utilization on MIG enabled devices. Both from NVML python bindings and the
nvidia-smioutput.I'm sorry for the poor exception handling in the example code. Can you try the Python code above again, but in Python REPL (just type
python3in command line)?And it could be better that there are some processes is running on the MIG device when you testing with the NVML bindings. You can try:
If you have installed TensorFlow or PyTorch, you can try:
This command will use the GPU for 2 minutes in the background.
@zabique commented on GitHub (Nov 11, 2021):
works fine on 2x3090 NVLINK (MIG)
@XuehaiPan commented on GitHub (Nov 12, 2021):
@zabique Thanks for the report. Glad to see people using
nvitopon Windows!According to your screenshot, you are using Dual-3090 on Windows, which is not a MIG setup. BTW, you can change the font of your terminal to get a better experience (the fonts are missing (
?s in boxes) in the graph views and the last characters of the bars.)NVIDIA Multi-Instance GPU User Guide: Introduction
The MIG feature is to split one physical GPU into multiple separate GPU instances.
By now, only A100 series and A30 GPUs support MIG mode and are only available on Linux (NVIDIA Multi-Instance GPU User Guide: Supported GPUs).
@zabique commented on GitHub (Nov 12, 2021):
Thanks for reply and font hint as I was too shy to ask about it :).
I thougjt MIG is enabled on my GPUs because nvidia-smi show it in too right corner.
I also compared performance in ubuntu 20.04 and windows and I can run same model with pretty much same performance + nvidia-smi in windows allow a lot more hardware control.
Feel free to ask for any testing.
Your nvitop is great!
@lixeon commented on GitHub (Nov 17, 2021):
I test in Python REPL, and seems just some little fix that we can add this MIG feature for nvitop.
Hope these information below will help you. And thanks for develop this awesome tools.
BTW, if i want to study the performance of GPU can i from these GPU info API code to start, how is it different from nvvp or nsight?
This time the process running situation in GPU.
@XuehaiPan commented on GitHub (Nov 17, 2021):
@lixeon Thanks a lot for the informative results. I'll try to improve
nvitopon MIG enabled devices.From NVML API Reference:
The NVML and applications based on it (
nvidia-smi,nvidia-ml-py,nvitop,nvtop,gpustat, etc.) are designed to monitor the GPU states in a global view. These tools can only capture the overall GPU SM and VRAM usage of a single process. They are not designed for code profiling.Nsight is a profiling tool that can grab more fine-grained GPU usage information (nvvp is deprecated). It counts the running time for each API call.
@XuehaiPan commented on GitHub (Jun 15, 2022):
Hi, you guys! I add the MIG support to the GUI. To install:
Any feedback is welcome.
@XuehaiPan commented on GitHub (Jun 26, 2022):
Close as resolved by PR #8.
@XuehaiPan commented on GitHub (Jul 14, 2022):
I just got my
sudoaccess to an A100 GPU. I tweaked the visual result of the CLI and may release it soon.@ki-arie commented on GitHub (Jul 14, 2022):
Omg incredible work! 🤩
@ytaoeer commented on GitHub (Nov 3, 2023):
so nvitop can't get the migdevice's gpu utilization and sm?
@XuehaiPan commented on GitHub (Nov 3, 2023):
@ytaoeer
nvitopis based on the NVML library. The API reference ofnvmlDeviceGetUtilizationRatesnotes that:All NVML-based monitoring tools cannot track the GPU utilization of the MIG instances (including
nvidia-smi). You can submit a feature request to the NVML upstream to ask for this support.