[GH-ISSUE #152] [BUG] Failure to correctly determine GPU-utilizing process details - "No such process" #96

Open
opened 2026-05-05 03:25:17 -06:00 by gitea-mirror · 7 comments
Owner

Originally created by @eyalroz on GitHub (Feb 16, 2025).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/152

Originally assigned to: @XuehaiPan on GitHub.

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

1.0.0

Operating system and version

SUSE GNU/Linux 15 SP1 x86_64

NVIDIA driver version

535.54.03

NVIDIA-SMI

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro RTX 6000                Off | 00000000:15:00.0 Off |                  Off |
| 33%   32C    P8              25W / 260W |    164MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Quadro RTX 6000                Off | 00000000:2D:00.0 Off |                  Off |
| 33%   38C    P8              32W / 260W |      4MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Quadro P620                    Off | 00000000:99:00.0 Off |                  N/A |
| 34%   28C    P8              N/A /  N/A |     98MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     15851      C   ...ake-build-debug/foo/bar                  160MiB |
|    2   N/A  N/A     18243      G   /usr/bin/X                                   64MiB |
|    2   N/A  N/A     22720      G   /usr/bin/gnome-shell                         28MiB |
+---------------------------------------------------------------------------------------+

Python environment

Installed using:

pip3 install --upgrade nvitop

(Couldn't insteall with git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop, that triggers a different error.)

After installation,

$ python3 -m pip freeze | python3 -c 'import sys; print(sys.version, sys.platform); print("".join(filter(lambda s: any(word in s.lower() for word in ("nvi", "cuda", "nvml", "gpu")), sys.stdin)))'
3.6.5 (default, Apr 05 2018, 13:30:06) [GCC] linux
gpustat==1.1.1
nvidia-ml-py==11.525.150
nvitop==1.0.0
PyJSONViewer==1.6.0

Problem description

nvidia-smi identifies 3 processes using GPUs, with the third being gnome-shell. The first two processes are identified by nvitop and listed appropriately, but for the third process, I get:

... snip ...
╒═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Processes:                                                                                                                                           joeuser1@mymachine │
│ GPU     PID      USER  GPU-MEM %SM  %CPU  %MEM       TIME  COMMAND                                                                                                      │
... snip...
│   2       0 G     N/A    22KiB   0   N/A   N/A        N/A  No Such Process                                                                                              │
╘═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛

Steps to Reproduce

Just ran nvitop.

Traceback

No error reported.

Logs


Expected behavior

I should see the process information, including the path and the PID, which nvidia-smi reports - for the third process as well.

Additional context

No response

Originally created by @eyalroz on GitHub (Feb 16, 2025). Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/152 Originally assigned to: @XuehaiPan on GitHub. ### Required prerequisites - [x] I have read the documentation <https://nvitop.readthedocs.io>. - [x] I have searched the [Issue Tracker](https://github.com/XuehaiPan/nvitop/issues) that this hasn't already been reported. (comment there if it has.) - [x] I have tried the latest version of nvitop in a new isolated virtual environment. ### What version of nvitop are you using? 1.0.0 ### Operating system and version SUSE GNU/Linux 15 SP1 x86_64 ### NVIDIA driver version 535.54.03 ### NVIDIA-SMI ```text +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Quadro RTX 6000 Off | 00000000:15:00.0 Off | Off | | 33% 32C P8 25W / 260W | 164MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Quadro RTX 6000 Off | 00000000:2D:00.0 Off | Off | | 33% 38C P8 32W / 260W | 4MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Quadro P620 Off | 00000000:99:00.0 Off | N/A | | 34% 28C P8 N/A / N/A | 98MiB / 2048MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 15851 C ...ake-build-debug/foo/bar 160MiB | | 2 N/A N/A 18243 G /usr/bin/X 64MiB | | 2 N/A N/A 22720 G /usr/bin/gnome-shell 28MiB | +---------------------------------------------------------------------------------------+ ``` ### Python environment Installed using: ``` pip3 install --upgrade nvitop ``` (Couldn't insteall with `git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop`, that triggers a different error.) After installation, ``` $ python3 -m pip freeze | python3 -c 'import sys; print(sys.version, sys.platform); print("".join(filter(lambda s: any(word in s.lower() for word in ("nvi", "cuda", "nvml", "gpu")), sys.stdin)))' 3.6.5 (default, Apr 05 2018, 13:30:06) [GCC] linux gpustat==1.1.1 nvidia-ml-py==11.525.150 nvitop==1.0.0 PyJSONViewer==1.6.0 ``` ### Problem description nvidia-smi identifies 3 processes using GPUs, with the third being `gnome-shell`. The first two processes are identified by nvitop and listed appropriately, but for the third process, I get: ``` ... snip ... ╒═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕ │ Processes: joeuser1@mymachine │ │ GPU PID USER GPU-MEM %SM %CPU %MEM TIME COMMAND │ ... snip... │ 2 0 G N/A 22KiB 0 N/A N/A N/A No Such Process │ ╘═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛ ``` ### Steps to Reproduce Just ran nvitop. ### Traceback ```pytb No error reported. ``` ### Logs ```text ``` ### Expected behavior I should see the process information, including the path and the PID, which nvidia-smi reports - for the third process as well. ### Additional context _No response_
gitea-mirror added the
pynvml
bug
upstream
labels 2026-05-05 03:25:17 -06:00
Author
Owner

@XuehaiPan commented on GitHub (Feb 16, 2025):

@eyalroz nvidia-smi gets the process command via NVML API nvmlSystemGetProcessName. nvitop gets the process command via psutil with the PID. To my best knowledge, the lowest PID in Linux is 1 (the init process). As your comment shown, the process identified as "No Such Process" is with PID 0.

<!-- gh-comment-id:2661521500 --> @XuehaiPan commented on GitHub (Feb 16, 2025): @eyalroz `nvidia-smi` gets the process command via NVML API `nvmlSystemGetProcessName`. `nvitop` gets the process command via `psutil` with the PID. To my best knowledge, the lowest PID in Linux is 1 (the init process). As your comment shown, the process identified as "No Such Process" is with PID 0.
Author
Owner

@eyalroz commented on GitHub (Feb 16, 2025):

The process identified as "No such process" actually pid 22720; it was /usr/bin/gnome-shell. I know that because the other two process (the lines for which I have snipped) showed up with their correct PIDs.

<!-- gh-comment-id:2661615958 --> @eyalroz commented on GitHub (Feb 16, 2025): The process identified as "No such process" actually pid 22720; it was `/usr/bin/gnome-shell`. I know that because the other two process (the lines for which I have snipped) showed up with their correct PIDs.
Author
Owner

@XuehaiPan commented on GitHub (Feb 16, 2025):

@eyalroz Could you run the following code in your Python console? We can investigate what is going on here:

$ python
>>> from nvitop import Device
>>> d = Device(2)
>>> d.processes()
>>> [str(p) for p in d.compute_running_processes()]
>>> [str(p) for p in d.graphics_running_processes()]
<!-- gh-comment-id:2661621109 --> @XuehaiPan commented on GitHub (Feb 16, 2025): @eyalroz Could you run the following code in your Python console? We can investigate what is going on here: ```console $ python ``` ```python >>> from nvitop import Device >>> d = Device(2) >>> d.processes() >>> [str(p) for p in d.compute_running_processes()] >>> [str(p) for p in d.graphics_running_processes()] ```
Author
Owner

@eyalroz commented on GitHub (Feb 16, 2025):

I can try this next time I'm on that system; but I'm not sure the same combination of process will be using the GPU which would produce the bug in the first place. I will try. Thanks for taking the time on this.

<!-- gh-comment-id:2661623510 --> @eyalroz commented on GitHub (Feb 16, 2025): I can try this next time I'm on that system; but I'm not sure the same combination of process will be using the GPU which would produce the bug in the first place. I will try. Thanks for taking the time on this.
Author
Owner

@eyalroz commented on GitHub (Feb 18, 2025):

Ok, so, I ran your suggested commands. Before doing that, did a quick nvidia-smi which says the following:

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     10734      C   ...-x86_64-Release/abcdefgh/foo_bar         160MiB |
|    0   N/A  N/A     58661      G   .../abcdefgh/123456/bin/abcde/bazbaz          3MiB |
|    1   N/A  N/A     10734      C   ...-x86_64-Release/abcdefgh/foo_bar         160MiB |
|    2   N/A  N/A     18243      G   /usr/bin/X                                   64MiB |
|    2   N/A  N/A     22720      G   /usr/bin/gnome-shell                         34MiB |
+---------------------------------------------------------------------------------------+

and then, your commands:

$ python3
Python 3.6.5 (default, Apr 05 2018, 13:30:06) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from nvitop import Device
>>> d = Device(2)
>>> d.processes()
{18243: GpuProcess(pid=18243, gpu_memory=64MiB, type=G, device=PhysicalDevice(index=2, name="Quadro P620", total_memory=2048MiB), host=HostProcess(pid=18243, name='X', status='sleeping', started='2024-12-09 09:26:58')), 0: GpuProcess(pid=0, gpu_memory=22KiB, type=G, device=PhysicalDevice(index=2, name="Quadro P620", total_memory=2048MiB), host=HostProcess(pid=0, status='terminated'))}
>>> [str(p) for p in d.compute_running_processes()]
[]
>>> [str(p) for p in d.graphics_running_processes()]
["{'pid': 18243, 'usedGpuMemory': 67387392, 'gpuInstanceId': 4294967295, 'computeInstanceId': 4294967295}", "{'pid': 0, 'usedGpuMemory': 22720, 'gpuInstanceId': 30048256, 'computeInstanceId': 0}"]

Hope that helps.

<!-- gh-comment-id:2665015176 --> @eyalroz commented on GitHub (Feb 18, 2025): Ok, so, I ran your suggested commands. Before doing that, did a quick nvidia-smi which says the following: ``` +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 10734 C ...-x86_64-Release/abcdefgh/foo_bar 160MiB | | 0 N/A N/A 58661 G .../abcdefgh/123456/bin/abcde/bazbaz 3MiB | | 1 N/A N/A 10734 C ...-x86_64-Release/abcdefgh/foo_bar 160MiB | | 2 N/A N/A 18243 G /usr/bin/X 64MiB | | 2 N/A N/A 22720 G /usr/bin/gnome-shell 34MiB | +---------------------------------------------------------------------------------------+ ``` and then, your commands: ``` $ python3 Python 3.6.5 (default, Apr 05 2018, 13:30:06) [GCC] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from nvitop import Device >>> d = Device(2) >>> d.processes() {18243: GpuProcess(pid=18243, gpu_memory=64MiB, type=G, device=PhysicalDevice(index=2, name="Quadro P620", total_memory=2048MiB), host=HostProcess(pid=18243, name='X', status='sleeping', started='2024-12-09 09:26:58')), 0: GpuProcess(pid=0, gpu_memory=22KiB, type=G, device=PhysicalDevice(index=2, name="Quadro P620", total_memory=2048MiB), host=HostProcess(pid=0, status='terminated'))} >>> [str(p) for p in d.compute_running_processes()] [] >>> [str(p) for p in d.graphics_running_processes()] ["{'pid': 18243, 'usedGpuMemory': 67387392, 'gpuInstanceId': 4294967295, 'computeInstanceId': 4294967295}", "{'pid': 0, 'usedGpuMemory': 22720, 'gpuInstanceId': 30048256, 'computeInstanceId': 0}"] ``` Hope that helps.
Author
Owner

@XuehaiPan commented on GitHub (Feb 18, 2025):

@eyalroz Thanks for the context. It's an invalid memory access issue.

>>> [str(p) for p in d.graphics_running_processes()]
[
    "{'pid': 18243, 'usedGpuMemory': 67387392, 'gpuInstanceId': 4294967295, 'computeInstanceId': 4294967295}",
-   "{'pid': 0, 'usedGpuMemory': 22720, 'gpuInstanceId': 30048256, 'computeInstanceId': 0}"
]

It is caused by miscalculated struct size of nvmlProcessInfo_t. You can see the true PID is shifted to the field usedGpuMemory.

Could you try the latest release with uv?

pip3 install uv
uvx nvitop
<!-- gh-comment-id:2665118628 --> @XuehaiPan commented on GitHub (Feb 18, 2025): @eyalroz Thanks for the context. It's an invalid memory access issue. ```diff >>> [str(p) for p in d.graphics_running_processes()] [ "{'pid': 18243, 'usedGpuMemory': 67387392, 'gpuInstanceId': 4294967295, 'computeInstanceId': 4294967295}", - "{'pid': 0, 'usedGpuMemory': 22720, 'gpuInstanceId': 30048256, 'computeInstanceId': 0}" ] ``` It is caused by miscalculated struct size of `nvmlProcessInfo_t`. You can see the true PID is shifted to the field `usedGpuMemory`. Could you try the latest release with `uv`? ```bash pip3 install uv uvx nvitop ```
Author
Owner

@eyalroz commented on GitHub (Feb 21, 2025):

Yes, I'll try it, but it's now the weekend so only in a few days.

<!-- gh-comment-id:2674296354 --> @eyalroz commented on GitHub (Feb 21, 2025): Yes, I'll try it, but it's now the weekend so only in a few days.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#96
No description provided.