mirror of
https://github.com/XuehaiPan/nvitop.git
synced 2026-05-15 14:15:55 -06:00
[GH-ISSUE #77] [Question] live metrics collector #47
Labels
No labels
api
bug
bug
cli / tui
dependencies
documentation
documentation
documentation
duplicate
enhancement
exporter
invalid
pull-request
pynvml
question
question
upstream
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/nvitop#47
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @mehrazi on GitHub (Jun 28, 2023).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/77
Originally assigned to: @XuehaiPan on GitHub.
Required prerequisites
Questions
Hi
Thanks for your great repo.
I have a question about the values of the metrics that nvitop collects.
If I'm not mistaken, the API returns mean/max/min values for specified intervals, I need to collect the absolute values for each second or every few seconds.
Is there any way to handle this?
@XuehaiPan commented on GitHub (Jul 2, 2023):
Hi @mehrdadazizi72, sorry for the late reply. Have you ever tried to manage the
deviceinstances and collect the values manually? That will be fully controlled by your code logic. Such as:Yes. This is the current behavior. There will be some delay in Python-C API conversion and system-driver-device communication. The current collector API asynchronizedly collects metrics in a separate thread to avoid blocking the main program. So the exact value will not return because of this async implementation. If you want to get the exact values, you need to use the synchronized implementation.
In addition, the current implementation does not collect metrics at the exact interval. For example, if the
interval=5and the API call takes 0.1-0.2 seconds, the metrics will be logged at(0, 5.1, 10.3, 15.4, ...)rather than the exact interval as(0, 5, 10, 15, ...).@Mousavi-Parisa commented on GitHub (Jul 4, 2023):
Hi @XuehaiPan!
I need to have a hardware benchmark and plot their exact values, thus I've tried collecting the GPU and CPU different usage metrics using "collect_in_background" in such peace of code:
I encountered a problem retrieving the exact values in a way to show the real-time changes. Actually, the interval inaccuracy doesn't matter but the CPU usage and exact values can not be retrieved in synchronized and asynchronized implementation respectively.
Is there a way to have both GPU and CPU metrics logged? sync or async, but the exact values.
Thanks in forward.
@XuehaiPan commented on GitHub (Jul 5, 2023):
Hi, @pmi94, thanks for your comment and the code snippet. Do you mean you want to log the exact value on each snapshot? If the answer is yes, I think this can be done by adding a new field
last(currently, we only havemin/max/mean).@Mousavi-Parisa commented on GitHub (Jul 5, 2023):
Actually I need the exact values' log on each snapshot for CPU as well as GPU, but I think it's only possible for GPU. Right?
@XuehaiPan commented on GitHub (Jul 6, 2023):
@pmi94
ResourceMetricCollectorcollects snapshots with both CPU and GPU metrics. But the metrics are only logged whencollector.collect()method is called rather than when the snapshot is tasked. The background collect interval is not exactly the same interval as the background snapshot interval. I would find a way to lock these two intervals synchronized.