mirror of
https://github.com/XuehaiPan/nvitop.git
synced 2026-05-15 14:15:55 -06:00
[GH-ISSUE #91] [Question] 建议修改 api/device.py line 2125 0.25s 改为 1s #57
Labels
No labels
api
bug
bug
cli / tui
dependencies
documentation
documentation
documentation
duplicate
enhancement
exporter
invalid
pull-request
pynvml
question
question
upstream
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/nvitop#57
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @hui-zhao-1 on GitHub (Aug 18, 2023).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/91
Originally assigned to: @XuehaiPan on GitHub.
Required prerequisites
Questions
最近使用 master 分支做测试,发现还是存在监控丢失的现象,怀疑 api/device.py line 2125 这里的 0.25s 太小导致的采样丢失
测试环境:
Ubuntu 0.04.6 LTS (Focal Fossa)
具体测试方法和结果如下:
@hui-zhao-1 commented on GitHub (Aug 18, 2023):
fork 代码以后,创建了下面四个分支:
https://github.com/2581543189/nvitop/tree/250ms
https://github.com/2581543189/nvitop/tree/500ms
https://github.com/2581543189/nvitop/tree/750ms
https://github.com/2581543189/nvitop/tree/1s
在测试机器上创建4 个conda 环境,并分别安装这四个环境的 nvitop,并通过 prometheus 收集这4 个环境的 process 监控,连续观察 12 个小时,结果如下:




可以看到,https://github.com/2581543189/nvitop/tree/1s 是几乎没有丢数据的分支
@hui-zhao-1 commented on GitHub (Aug 18, 2023):
测试用的gpu程序如下:
收集 prometheus 监控的代码如下:
https://github.com/2581543189/nvitop/blob/250ms/nvitop/prometheus/cli.py
https://github.com/2581543189/nvitop/blob/500ms/nvitop/prometheus/cli.py
https://github.com/2581543189/nvitop/blob/750ms/nvitop/prometheus/cli.py
https://github.com/2581543189/nvitop/blob/1s/nvitop/prometheus/cli.py
@XuehaiPan commented on GitHub (Aug 18, 2023):
依据
man nvidia-smi:GPU Utilization 的采样周期为 1/6s ~ 1s,输出为该周期内的平均值。
nvidia-smi pmon使用的 timestamp 值为上次采样时的时间戳。nvidia-smi pmon的默认采样周期为 1s,输出为该周期内的平均值。我会在下一个 PR 中做相关修改。
ruffintegration #142