mirror of
https://github.com/XuehaiPan/nvitop.git
synced 2026-05-15 14:15:55 -06:00
[GH-ISSUE #128] [BUG] Memory leaking for Nvitop instances inside docker container #79
Labels
No labels
api
bug
bug
cli / tui
dependencies
documentation
documentation
documentation
duplicate
enhancement
exporter
invalid
pull-request
pynvml
question
question
upstream
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/nvitop#79
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @kenvix on GitHub (Jun 26, 2024).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/128
Originally assigned to: @XuehaiPan on GitHub.
Required prerequisites
What version of nvitop are you using?
1.3.2
Operating system and version
Ubuntu 22.04.4 LTS
NVIDIA driver version
535.104.12
NVIDIA-SMI
Python environment
3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] linux
nvidia-ml-py==12.535.133
nvitop==1.3.2
Problem description
Nvitop has memory leaking issue for instances inside docker container. (RAM, not VRAM). Even the operating system takes ten seconds to reclaim memory after SIGKILL a process.
Steps to Reproduce
Just keep running
nvtopabout few months. You'll seenvtopconsumed a lot of system memory. About 300GB in 77 days for my instance.Is this caused by nvitop recorded too much vRAM and GPU utilization information but not releasing it?
Traceback
No response
Logs
No response
Expected behavior
No response
Additional context
No response
@XuehaiPan commented on GitHub (Jul 9, 2024):
Hi @kenvix, thanks for raising this.
I tested it locally, but I cannot reproduce this. I use a script to create and terminate 10k processes on the GPU.
The memory consumption of
nvitopis stable around 260M.@kenvix commented on GitHub (Jul 10, 2024):
Hi, @XuehaiPan
The test code you provided does not seem relevant to this issue. In my case, using tmux or screen to keep nvitop running, you will find that the memory (RAM, not GPU VRAM) usage of nvitop itself will continue to slowly increase over time.
For my example below, I ran it for 12 hours:
It used about 4.5G RAM
@XuehaiPan commented on GitHub (Jul 10, 2024):
@kenvix could you test
nvitopusingpipx? Maybe it is caused by a dependency (e.g., the unofficialpynvmlpackage).Running after 2 days:
@alexanderfrey commented on GitHub (Oct 25, 2024):
similar problem here. latest version 1.3.2 with nvidia 560.35.03 driver version. approx ram usage for nvitop is whopping 30GB. This happened btw after I installed Ubuntu 24.10. Before that everything was good. Strangley the starting time for nvitop went up to 30sec (!) and during the starting it allocates those 30GB of memory into the ram. I suspect some faulty library...
@XuehaiPan commented on GitHub (Oct 25, 2024):
Hi, thanks for the report. It would be very helpful if you can run the following code. We can investigate where the memory leak is coming from. (e.g., device query, process query, psutil, curses).
@alexanderfrey commented on GitHub (Oct 28, 2024):
Thanks for replying ! Here is the output:
@alexanderfrey commented on GitHub (Oct 28, 2024):
btw. this did not result in excessive memory use.
@XuehaiPan commented on GitHub (Oct 28, 2024):
Hey @alexanderfrey, thanks for the report. Could you change the value of
show_processestoTruein the script and rerun it (hold and run for several hours)?@gthelding commented on GitHub (Dec 19, 2024):
@alexanderfrey I have the same trouble running nvitop and nvidia-smi. They eat 52GB of RAM and take 30s to start up. I'm running Ubuntu 24.10 and an RTX 3060 with the 560.35.05 driver.
I've discovered that this is an issue with the nvidia-persistenced service.
If you
sudo chmod o-w /var/run/nvidia-persistenced/socket, you can run nvitop as normal.I thought I'd drop a note here so anyone seeing this will no the issue isn't with nvitop.