mirror of
https://github.com/XuehaiPan/nvitop.git
synced 2026-05-15 06:06:12 -06:00
[GH-ISSUE #152] [BUG] Failure to correctly determine GPU-utilizing process details - "No such process" #96
Labels
No labels
api
bug
bug
cli / tui
dependencies
documentation
documentation
documentation
duplicate
enhancement
exporter
invalid
pull-request
pynvml
question
question
upstream
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/nvitop#96
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @eyalroz on GitHub (Feb 16, 2025).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/152
Originally assigned to: @XuehaiPan on GitHub.
Required prerequisites
What version of nvitop are you using?
1.0.0
Operating system and version
SUSE GNU/Linux 15 SP1 x86_64
NVIDIA driver version
535.54.03
NVIDIA-SMI
Python environment
Installed using:
(Couldn't insteall with
git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop, that triggers a different error.)After installation,
Problem description
nvidia-smi identifies 3 processes using GPUs, with the third being
gnome-shell. The first two processes are identified by nvitop and listed appropriately, but for the third process, I get:Steps to Reproduce
Just ran nvitop.
Traceback
Logs
Expected behavior
I should see the process information, including the path and the PID, which nvidia-smi reports - for the third process as well.
Additional context
No response
@XuehaiPan commented on GitHub (Feb 16, 2025):
@eyalroz
nvidia-smigets the process command via NVML APInvmlSystemGetProcessName.nvitopgets the process command viapsutilwith the PID. To my best knowledge, the lowest PID in Linux is 1 (the init process). As your comment shown, the process identified as "No Such Process" is with PID 0.@eyalroz commented on GitHub (Feb 16, 2025):
The process identified as "No such process" actually pid 22720; it was
/usr/bin/gnome-shell. I know that because the other two process (the lines for which I have snipped) showed up with their correct PIDs.@XuehaiPan commented on GitHub (Feb 16, 2025):
@eyalroz Could you run the following code in your Python console? We can investigate what is going on here:
@eyalroz commented on GitHub (Feb 16, 2025):
I can try this next time I'm on that system; but I'm not sure the same combination of process will be using the GPU which would produce the bug in the first place. I will try. Thanks for taking the time on this.
@eyalroz commented on GitHub (Feb 18, 2025):
Ok, so, I ran your suggested commands. Before doing that, did a quick nvidia-smi which says the following:
and then, your commands:
Hope that helps.
@XuehaiPan commented on GitHub (Feb 18, 2025):
@eyalroz Thanks for the context. It's an invalid memory access issue.
It is caused by miscalculated struct size of
nvmlProcessInfo_t. You can see the true PID is shifted to the fieldusedGpuMemory.Could you try the latest release with
uv?@eyalroz commented on GitHub (Feb 21, 2025):
Yes, I'll try it, but it's now the weekend so only in a few days.