mirror of
https://github.com/XuehaiPan/nvitop.git
synced 2026-05-15 14:15:55 -06:00
[GH-ISSUE #139] [BUG] Segmentation Fault when one GPU lost from PCIe bus #87
Labels
No labels
api
bug
bug
cli / tui
dependencies
documentation
documentation
documentation
duplicate
enhancement
exporter
invalid
pull-request
pynvml
question
question
upstream
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/nvitop#87
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Junyi-99 on GitHub (Nov 19, 2024).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/139
Originally assigned to: @XuehaiPan on GitHub.
Required prerequisites
What version of nvitop are you using?
1.3.2
Operating system and version
Ubuntu 22.04
NVIDIA driver version
560.35.03
NVIDIA-SMI
Python environment
$ python3 -m pip freeze | python3 -c 'import sys; print(sys.version, sys.platform); print("".join(filter(lambda s: any(word in s.lower() for word in ("nvi", "cuda", "nvml", "gpu")), sys.stdin)))'
3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] linux
gpustat==1.1.1
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.535.108
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.68
nvidia-nvtx-cu12==12.1.105
nvitop==1.3.2
Problem description
The
nvitopexits with a segmentation fault when one of the gpu is lost from the bus.First of all, this is not a problem with nvitop itself.
I encountered this issue and would like to suggest that
nvitopshould still be able to display other GPUs even when one GPU is faulty, instead of resulting in a segmentation fault.It would be nice if nvitop could skip the faulty GPU. (like
gpustat)Steps to Reproduce
nvitopTraceback
Logs
No response
Expected behavior
It would be nice if nvitop could skip the faulty GPU.
For example
gpustatcan show the faulty GPU:Additional context
No response
@XuehaiPan commented on GitHub (Jan 13, 2025):
Sorry for the late response. You can try: